Getting a Raw Pointer to the end of a Container - c++

If I have the end iterator to a container, but I want to get a raw pointer to that is there a way to accomplish this?
Say I have a container: foo. I cannot for example do this: &*foo.end() because it yields the runtime error:
Vector iterator not dereferencable
I can do this but I was hoping for a cleaner way to get there: &*foo.begin() + foo.size().
EDIT:
This is not a question about how to convert an iterator to a pointer in general (obviously that's in the question), but how to specifically convert the end iterator to a pointer. The answers in the "duplicate" question actually suggest dereferencing the iterator. The end iterator cannot be dereferenced without seg-faulting.

The correct way to access the end of storage is:
v.data() + v.size()
This is because *v.begin() is invalid when v is empty.
The member function data is provided for all contiguous containers (vector, string and array).
From C++17 you will also be able to use the non-member functions:
data(v) + size(v)
This works on raw arrays as well.

In general? No.
And the fact that you're asking indicates that something is wrong with your overall design.
For vectors, arrays, strings? Sureā€¦ but why?
Just get a pointer to a valid element, and advance it:
std::vector<T> foo;
const T* ptr = foo.data() + foo.size();
As long as you don't dereference such a pointer (which is almost equivalent to dereferencing the iterator, as you did in your attempt) it is valid to obtain and hold such a pointer, because it points to the special one-past-the-end location.
Note that &foo[0] + foo.size() has undefined behaviour if the vector is empty, because &foo[0] is &*(foo.data() + 0) is &*foo.data(), and (just like in your attempt) *foo.data() is disallowed if there's nothing there. So we avoid all dereferencing and simply advance foo.data() itself.
Anyway, this only works for the case of vectors1, arrays and strings, though. Other containers do not guarantee (or can be reasonably expected to provide) storage contiguity; their end pointers could be almost anything, e.g. a "sentinel" null pointer, which is unlikely to be of any use to you.
That is why the iterator abstraction is there in the first place. Stick to it if you can, instead of delving into raw pointer usage.
1. Excepting std::vector<bool>.

Related

Can std::vector<T>::iterator simply be T*?

Simple theoretical question: would a simple pointer be a valid iterator type for std::vector?
For other containers (e.g. list, map), that would not be possible, but for std::vector the held data is guaranteed to be contiguous, so I see no reason why not.
As far as I know, some implementations (e.g. Visual Studio) do some safe checks on debug build. But that is in UB territory, so for well defined behavior I think there is no difference.
Apart for some checks ("modifying" undefined behavior), are there any advantages of using a class instead of a simple pointer for vector iterators?
would a simple pointer be a valid iterator type for std::vector?
Yes. And also for std::basic_string and std::array.
are there any advantages of using a class instead of a simple pointer for vector iterators?
It offers some additional type safety, so that logic errors like the following don't compile:
std::vector<int> v;
int i=0;
int* p = &i;
v.insert(p, 1); // oops, not an iterator!
delete v.begin(); // oops!
std::string s;
std::vector<char> v;
// compiles if string and vector both use pointers for iterators:
v.insert(s.begin(), '?');
std::array<char, 2> a;
// compiles if array and vector both use pointers for iterators:
v.erase(a.begin());
Yes, it can be T*, but that has the slightly annoying property that the ADL-associated namespace of std::vector<int>::iterator is not std:: ! So swap(iter1, iter2) may fail to find std::swap.
A food for thought - an iterator class can also be implemented by the terms of indexes instead of pointers
of course, when a vector reallocates, all the pointers , references and iterators become invalidated.
but at least for iterators, that doesn't have to be the case always, if the iterator holds an index + pointer to the vector, you can create non-reallocation-invalidated iterators that simply returns (*m_vector)[m_index]. the iterator is invalid when the vector dies out, or the index is invalid. in other words, the iterator is invalid only if the term vec[i] is invalid, regardless of reallocations.
this is a strictly non standard implementation of a vector iterator, but non the less an advantage to class based iterator rather than raw pointer.
also, UB doesn't state what should happen when an invalid iterator is being dereferenced. throwing an exception or logging an error fall under UB.
that means that an iterator which does bound checking is significantly slower, but for some cases where speed is not important, "safe but slow iterator" may have more advantages than "unsafe but fast iterator"

How is x = &(*variable) different from x = variable?

I've been trying to understand c++ by going over some projects and i came across this:
vector<Circle>::iterator circleIt = mCircles.end();
..
mCurrentDragCircle = &(*circleIt);
Why would you dereference and then reference it again?
The *-operator is overloaded for the iterator class. It doesn't do a simple dereference. Instead, it returns a reference to the variable it's currently pointing at. Using the reference operator on this returns a pointer towards the variable.
Using mCurrentDragCircle = circleIt; would assign the iterator to your field.
circleIt is an iterator, and iterators overload operator* so that they look like pointers. Dereferencing an iterator gives a reference to the value; & turns that into a pointer to the value. So this is converting an iterator to a pointer.
Btw., dereferencing past-the-end, i.e. *v.end(), has undefined behavior. The proper way to do this is
v.data() + v.size()
in C++11, or
&v[0] + v.size()
in C++98. Both assume the vector is non-empty.
Because of how iterators works.
*circleIt
will return instance of Circle, e.g. you can do
Circle & c = *circleIt;
Than you take address of that circle and store it into a pointer
Circle * mCurrentDragCircle = & (*circleIt);
Hope this helps.
&* is a particularly pernicious idiom for extracting an underlying pointer. It's frequently used on objects that have an overloaded dereference operator *.
One such class of objects are iterators. The author of your class is attempting to get the address of the underlying datum for some reason. (Two pitfalls here, (i) *end() gives undefined behaviour and (ii) it appears that the author is storing the pointer value; blissfully unaware that modifications on that container could invalidate that value).
You can also see it used with smart pointers to circumvent reference counting.
My advice: avoid if at all possible.

dereference a pointer and then take the address of dereference

I read STL and a usage of pointer puzzles me.
destroy(&*first);
first is a pointer, then "&*first" is equal to first, why not use first directly?
destroy is declared as follow:
void destroy(T* pointer)
T is a template parameter.
This is most likely due to operator overloading. first is the name typically given to iterators, which overload operator* to return a reference to the pointed to element of the container then operator& is used to get the address of the variable the iterator was pointing to. You can read more about iterators here.
However, if first is a pointer and not a user defined iterator, then yes you can just use first directly. Note that pointers can be iterators (specifically, RandomAccessIterators).
If the argument is already a raw pointer, then &* combination does nothing, as you already noted. However, if the argument is a class type with overloaded unary * operator, then &* is no longer a no-op.
The classic usage of &* pattern is to convert a "generic" pointer into a raw pointer. For example, if it is an iterator associated with a vector element, then &*it will give you an ordinary raw pointer to that vector element.

Converting between pointers and references

std::list<Value> stdList;
stdList.push_back(Value());
Value * ptr = &stdList.back(); // <-- what will this address point to?
If I take the reference returned by back() and implicitly convert it to the less generic Value *, will it point to the last value of the list, or will it point to someplace unexpected?
And is there a way to create an iterator from a pointer, for use with std::list functions such as erase()? I realize generic to specific (iterator to pointer) is far more feasible than going the other direction, but I thought I'd ask anyway.
The pointer will point to the value as it is stored inside the container. The reference did the same thing.
You can't turn that pointer into a list iterator directly, because you've lost all the information about the surrounding structure. To do this you would have to get clever with list::find.
What you are trying to do it is sometimes done using vector. The reason you can turn a vector data element pointer into an iterator (array index) is because you know the structure of a vector.
Please note that list::back() does not return an iterator. It returns a reference. The two are quite different. Are you thinking about list::end()? Or are you genuinely confused between iterators and references? Because you can get a reference from a pointer. You do it like this:
Value& refval = *ptr;
Yes, The pointer point to the last value stored in the list.
We can not create an iterator from a pointer, iterator is a concept for the container (list here), and iterator don't care about what kind of value stored on the list.
Reference and pointer are handle for the value stored on the list, they are interchangeable, we can convert a reference to a pointer and vice versa.

Curious behaviour of std::string::operator[] in MSVC

I've been using some semi-iterators to tokenize a std::string, and I've run into a curious problem with operator[]. When constructing a new string from a position using char*, I've used something like the following:
t.begin = i;
t.end = i + 1;
t.contents = std::string(&arg.second[t.begin], &arg.second[t.end]);
where arg.second is a std::string. But, if i is the position of the last character, then arg.second[t.end] will throw a debugging assertion- even though taking a pointer of one-past-the-end is well defined behaviour and even common for primitive arrays, and since the constructor is being called using iterators I know that the end iterator will never be de-referenced. Doesn't it seem logical that arg.second[arg.second.size()] should be a valid expression, producing the equivalent of arg.second.end() as a char*?
You're not taking a pointer to one past the end, you're ACCESSING one past the end and then getting the address of that. Entirely different and while the the former is well defined and well formed, the latter is not either. I suggest using the iterator constructor, which is basically what you ARE using but do so with iterators instead of char*. See Alexandre's comment.
operator[](size_type pos) const doesn't return one-past-the-end is pos == size(); it returns charT(), which is a temporary. In the non-const version of operator[], the behavior is undefined.
21.3.4/1
const_reference operator[](size_type pos) const;
reference operator[](size_type pos);
1 Returns: If pos < size(), returns data()[pos]. Otherwise, if pos == size(), the const
version returns charT(). Otherwise, the behavior is undefined.
What is well-defined is creating an iterator one past the end. (Pointers might be iterators, too.) However, dereferencing such an iterator will yield Undefined Behavior.
Now, what you're doing is array subscription, and that is very different from forming iterators, because it returns a reference to the referred-to object (much akin to dereferencing an iterator). You are certainly not allowed to access an array one-past-the-end.
std::string is not an array. It is an object, whose interface loosely resembles an array (namely, provides operator[]). But that's when the similarity ends.
Even if we for a second assume that std::string is just a wrapper built on top of an ordinary array, then in order to obtain the one-past-the-end pointer for the stored sequence, you have to do something like &arg.second[0] + t.end, i.e. instead of going through the std::string interface first move into into the domain of ordinary pointers and use ordinary low-level pointer arithmetic.
However, even that assumption is not correct and doing something like &arg.second[0] + t.end is a recipe for disaster. std::string is not guaranteed to store its controlled sequence as an array. It is not guaranteed to be stored continuously, meaning that regardless of where your pointers point, you cannot assume that you'll be able to iterate from one to another by using pointer arithmetic.
If you want to use an std::string in some legacy pointer-based interface the only choice you have is to go through the std::string::c_str() method, which will generate a non-permanent array-based copy of the controlled sequence.
P.S. Note, BTW, that in the original C and C++ specifications it is illegal to use the &a[N] method to obtain the one-past-the-end pointer even for an ordinary built-in array. You always have to make sure that you are not using the [] operator with past-the-end index. The legal way to obtain the pointer has always been something like a + N or &a[0] + N, but not &a[N]. Recent changes legalized the &a[N] approach as well, but nevertheless originally it was not legal.
A string is not a primitive array, so I'd say the implementation is free to add some debug diagnostics if you are doing something dangerous like accessing elements outside its range. I would guess that a release build will probably work.
But...
For what you are trying to do, why not just use the basic_string( const basic_string& str, size_type index, size_type length ); constructor to create the sub strings?