Reversing an on-demand iterator

Reversing an on-demand iterator - c++

I have an iterator DataIterator that produces values on-demand, so the dereference operator returns a Data, and not a Data&. I thought it was an OK thing to do, until I tried to reverse the data DataIterator by wrapping it in a reverse_iterator.
DataCollection collection
std::reverse_iterator<DataIterator> rBegin(iter) //iter is a DataIterator that's part-way through the collection
std::reverse_iterator<DataIterator> rEnd(collection.cbegin());
auto Found = std::find_if(
rBegin,
rEnd,
[](const Data& candidate){
return candidate.Value() == 0x00;
});
When I run the above code, it never finds a Data object whose value is equal to 0, even though I know one exists. When I stick a breakpoint inside the predicate, I see weird values that I would never expect to see like 0xCCCC - probably uninitialized memory. What happens is that the reverse_iterator's dereference operator looks like this (from xutility - Visual Studio 2010)
Data& operator*() const
{ // return designated value
DataIterator _Tmp = current;
return (*--_Tmp); //Here's the problem - the * operator on DataIterator returns a value instead of a reference
}
The last line is where the problem is - a temporary Data gets created and reference to that data gets returned. The reference is invalid immediately.
If I change my predicate in std::find_if to take (Data candidate) instead of (const Data& candidate) then the predicate works - but I'm pretty sure I'm just getting lucky with undefined behavior there. The reference is invalid, but I'm making a copy of the data before the memory gets clobbered.
What can I do?
Fix my DataIterator so that operator* returns a Data& instead of a Data? I don't really see how this is possible. The whole point of my DataIterator returning a Data instead of a Data& is because I don't have room to hold the entire uncompressed data set in memory, so I create the items that you want to look at on demand. Maybe I could hold onto the 'current' data value - but then that reference is going to become invalid the moment you increment or decrement the DataIterator. Edit one of the answers suggests a shared_ptr
Write a specialization of the reverse_iterator and make its dereference operator return a value instead of a reference? This seems like a frustrating amount of work, but understandable since it's my DataIterator that's not playing nice here - not the rest of the STL.
Along the same lines, maybe make a find_if that goes in reverse - probably less work than specializing reverse_iterator.
Something else I haven't thought of
Is there something I can do to DataIterator that will prevent someone else from blowing half a day figuring out what's wrong when they try the same thing 6 months from now?

Not that I'm a great fan of the idea, but if you heap allocated a Data object and then returned a ref to a shared_ptr to it, that would allow the outside world to hold onto it longer if needed, and for you to "forget" about it when you step forward.
On the other hand, implementing your own native reverse_iterator might be a bigger win. That's what I did for my own linked list since I didn't use a sentinel object like gcc does and couldn't use std::reverse_iterator. It really wasn't that difficult.

It's because the reverse_iterator interface was designed before the existence of decltype. Today, that would be written as
auto operator*() const -> decltype(*current)
{ // return designated value
DataIterator _Tmp = current;
return (*--_Tmp);
}
and in C++14, even the trailing return type won't be needed, since it can be inferred.
decltype(auto) operator*() const
{ // return designated value
DataIterator _Tmp = current;
return (*--_Tmp);
}

I ended up going with Casey's suggestion in the comments. He didn't post it as an answer that I can accept, so I'll write it up myself.
I made a specialization of the reverse_iterator for DataIterator that returns a value instead of a reference. This involved copy/pasting the implementation from xutility, specifying one of the template arguments to be a DataIterator, and changing
reference operator*() const
to
value operator*() const

Related

Input Iterator - Star and Postfix-Operator

Is it valid to do this on an input iterator *it++ ?
I understand the code as follow, that it dereference the iterator and gives me the value and then step one ahead.
In the c++ reference the * operator is lower than the postfix operator: http://en.cppreference.com/w/cpp/language/operator_precedence
But I read this style is bad practice. Why?

Is it valid to do this on an input iterator *it++?
Yes, that is valid. The iterator will be incremented, its previous value will be returned, and you will dereference that returned iterator.
But I read this style is bad practice. Why?
Consider these two implementations I've just pulled out of some graph code I wrote a while back:
// Pre-increment
BidirectionalIterator& operator++()
{
edge = (edge->*elem).next;
return *this;
}
// Post-increment
BidirectionalIterator operator++(int)
{
TargetIterator oldval(list, edge);
edge = (edge->*elem).next;
return oldval;
}
Notice that for post-increment, I need to first construct a new iterator to store the previous value which will be returned.
If it's simple and clear to write your program to make use of pre-increment, there may be better performance, less work for the compiler to do, or both.
Just don't go crazy on this (for example, rewriting all your loops)! That would likely be micro-optimization. However, the reason people say it's good practice is that if you get into a habit of using pre-increment as default then you get (potentially) better performance by default.

C++ STL - Why treat a function that returns an iterator as a void function?

The STL has many functions that return iterators. For example, the STL list function erase returns an iterator (both C++98 and C++11). Nevertheless, I see it being used as a void function. Even the cplusplus.com site has an example that contains the following code:
mylist.erase(it1, it2);
which does not return an iterator. Shouldn't the proper syntax be as follows?
std::list<int>::iterator iterList = mylist.erase(it1, it2)?

You are not forced to use a return value in C++. Normally the returned iterator should be useful to have a new valid iterator to that container. But syntactically it's correct. Just keep in mind that after the erasure, it1 and it2 will not be valid any more.

which does not return an iterator. Shouldn't the proper syntax be as
follows?
It does return an iterator, but just as with any other function you can ignore the returned value. If you just want to erase an element and dont care about the returned iterator, then you may simply ignore it.
Actually ignoring the return value is more common than you might expect. Maybe the most common place where return values are ignored is when the only purpose of the return value is to enable chaining. For example the assignment operator is usually written as
Foo& operator=(const Foo& other){
/*...*/
return *this;
}
so that you can write something like a = b = c. However, when you only write b = c you usually ignore the value returned by that call. Also note, that when you write b = c this does return a value, but if you just want to make this assignment, then you have no other choice than to ignore the return value.

The actual reason is far more banal than you might think. If you couldn't ignore a return value, many of the functions like .erase() would need to be split in two versions: an .erase() version returning void and another .erase_and_return() version which returned the iterator. What would you gain from this?

Return values are sometimes that are sometimes useful. If they are not useful, you don't have to use them.
All container.erase functions return the iterator after the newly erased range. For non-node-based containers this is often (but not always) useful because the iterators at and after the range are no longer valid.
For node-based containers this is usually useless, as the 2nd iterator passed in remains valid even after the erase operation.
Regardless, both return that iterator. This permits code that works on a generic container to not have to know if the container maintains valid iterators after the erase or not; it can just store the return value and have a valid iterator to after-the-erase.
Iterators in C++ standard containers are all extremely cheap to create and destroy and copy; in fact, they are in practice so cheap that compilers can eliminate them entirely if they aren't used. So returning an iterator that isn't used can have zero run time cost.
A program that doesn't use this return value here can be a correct program, both semantically and syntactically. At the same time, other semantically and syntactically correct programs will require that you use that return value.
Finally,
mylist.erase(it1, it2);
this does return an iterator. The iterator is immediately discarded (the return value only exists as an unnamed temporary), and compilers are likely to optimize it out of existence.
But just because you don't store a return value, doesn't mean it isn't returned.

Is it wise to use a pointer to access values in an std::map

Is it dangerous to returning a pointer out of a std::map::find to the data and using that as opposed to getting a copy of the data?
Currently, i get a pointer to an entry in my map and pass it to another function to display the data. I'm concerned about items moving causing the pointer to become invalid. Is this a legit concern?
Here is my sample function:
MyStruct* StructManagementClass::GetStructPtr(int structId)
{
std::map<int, MyStruct>::iterator foundStruct;
foundStruct= myStructList.find(structId);
if (foundStruct== myStructList.end())
{
MyStruct newStruct;
memset(&newStruct, 0, sizeof(MyStruct));
myStructList.structId= structId;
myStructList.insert(pair<int, MyStruct>(structId, newStruct));
foundStruct= myStructList.find(structId);
}
return (MyStruct*) &foundStruct->second;
}

It would undoubtedly be more typical to return an iterator than a pointer, though it probably makes little difference.
As far as remaining valid goes: a map iterator remains valid until/unless the item it refers to is removed/erased from the map.
When you insert or delete some other node in the map, that can result in the nodes in the map being rearranged. That's done by manipulating the pointers between the nodes though, so it changes what other nodes contain pointers to the node you care about, but does not change the address or content of that particular node, so pointers/iterators to that node remain valid.

As long as you, your code, and your development team understand the lifetime of std::map values ( valid after insert, and invalid after erase, clear, assign, or operator= ), then using an iterator, const_iterator, ::mapped_type*, or ::mapped_type const* are all valid. Also, if the return is always guaranteed to exist, then ::mapped_type&, or ::mapped_type const& are also valid.
As for wise, I'd prefer the const versions over the mutable versions, and I'd prefer references over pointers over iterators.
Returning an iterator vs. a pointer is bad:
it exposes an implementation detail.
it is awkward to use, as the caller has to know to dereference the iterator, that the result is an std::pair, and that one must then call .second to get the actual value.
.first is the key that the user may not care about.
determining if an iterator is invalid requires knowledge of ::end(), which is not obviously available to the caller.

It's not dangerous - the pointer remains valid just as long as an iterator or a reference does.
However, in your particular case, I would argue that it is not the right thing anyway. Your function unconditionally returns a result. It never returns null. So why not return a reference?
Also, some comments on your code.
std::map<int, MyStruct>::iterator foundStruct;
foundStruct = myStructList.find(structId);
Why not combine declaration and assignment into initialization? Then, if you have C++11 support, you can just write
auto foundStruct = myStructList.find(structId);
Then:
myStructList.insert(pair<int, MyStruct>(structId, newStruct));
foundStruct = myStructList.find(structId);
You can simplify the insertion using make_pair. You can also avoid the redundant lookup, because insert returns an iterator to the newly inserted element (as the first element of a pair).
foundStruct = myStructList.insert(make_pair(structId, newStruct)).first;
Finally:
return (MyStruct*) &foundStruct->second;
Don't ever use C-style casts. It might not do what you expect. Also, don't use casts at all when they're not necessary. &foundStruct->second already has type MyStruct*, so why insert a cast? The only thing it does is hide a place that you need to change if you ever, say, change the value type of your map.

Yes,
If you build a generic function without knowing the use of it, it can be dangerous to return the pointer (or the iterator) since it can become un-valid.
I would advice do one of two:
1. work with std::shared_ptr and return that. (see below)
2. return the struct by value (can be slower)
//change the difination of the list to
std::map<int, std::shared_ptr<MyStruct>>myStructList;
std::shared_ptr<MyStruct> StructManagementClass::GetStructPtr(int structId)
{
std::map<int, std::shared_ptr<MyStruct>>::iterator foundStruct;
foundStruct = myStructList.find(structId);
if (foundStruct == myStructList.end())
{
MyStruct newStruct;
memset(&newStruct, 0, sizeof(MyStruct));
myStructList.structId= structId;
myStructList.insert(pair<int, shared_ptr<MyStruct>>(structId, shared_ptr<MyStruct>(newStruct)));
foundStruct= myStructList.find(structId);
}
return foundStruct->second;

Weird behavior with map.find and a pointer to a vector

I have a map of pairs to a vector of vectors that looks like this:
std::map<std::pair<uint16, uint16>, std::vector<std::vector<uint32> > >
The map is populated in the constructor of a class. This class provides a public method that returns a pointer to std::vector<std::vector<uint32> > (the map value part), with something like this:
typedef std::pair<uint16, uint16> key;
typedef std::vector<std::vector<uint32> > value;
value* FindValues(key someKey) {
std::map<key, value>::const_iterator it;
it = someStore.find(someKey);
if (it != someStore.end())
return &(value)it->second;
return NULL;
}
This is when it gets weird. When iterating over the vector returned by FindValues, all child vectors have a large, negative number (such as -1818161232) as their first value. But if I use a function like:
value FindValues(key someKey) {
std::map<key, value>::const_iterator it;
return someStore.find(someKey)->second;
}
...then the value is normal. This only happens for the value at index 0 of all child vectors. With the second method, though, my application segfaults if a key wasn't found (for obvious reasons). What am I doing wrong?

If the return statement truly looks as
return &(value) it->second;
then there are several things that can be said about it:
Your compiler is broken if it accepts it without issuing diagnostic messages. In C++ it is illegal to apply built-in unary & to a result of non-reference cast. The (value) it->second expression produces a temporary object, an rvalue. You can't obtain the address of such object by using &. The code should not even compile.
If your compiler accepts it as some kind of weird "extension", then it means that you are indeed obtaining and returning the address of a temporary object. The temporary object is then immediately destroyed, leaving your pointer pointing to garbage. No wonder you see some weird values through such pointer.
The need for some sort of cast arises from the fact that you used const_iterator to store the result of the search. Apparently you made a misguided attempt to cast away constness of it->second with your (value) cast. The correct way to do it might look as follows
return const_cast<value *>(&it->second);
But why did you use const_iterator in the first place? The right thing to do would be to use a regular iterator and just do
return &it->second;
without any extra casts.
You need to decide what kind of FindValue method you are trying to write. If this is supposed to be a constant method, it should return const value * and should be declared as const
const value* FindValues(key someKey) const
and, of course, you should use const_iterator inside in this case.
If your FindValue is supposed to be a non-constant method, then you can keep the current declaration
value* FindValues(key someKey)
but use ordinary iterator inside.
What you have now is some sort of hybrid of the two, which is what makes you to resort to weird casts. (In fact, you will probably need both versions in your class. One can be implemented through the other.)

Your typedefs are quite misleading. This is the erroneous line:
return &(value)it->second;
What appears to be a simple C-style type cast is actually a call to std::vector's copy constructor. This line could be rewritten as
return &std::vector<std::vector<uint32> >(it->second)
The reason for the weird results become visible when you rewrite this line as following:
std::vector<std::vector<uint32> > result (it->second);
return &result;
You are actually returning the address of a local object that will be destroyed as soon as the function returns.

So, this variant will be better.
typedef std::pair<uint16, uint16> key;
typedef std::vector<std::vector<uint32> > value;
value* FindValues(key someKey) {
std::map<key, value>::const_iterator it;
it = someStore.find(someKey);
if (it != someStore.end())
return &const_cast<value&>(it->second);
return 0;
}

Overloading operator [] for a sparse vector

I'm trying to create a "sparse" vector class in C++, like so:
template<typename V, V Default>
class SparseVector {
...
}
Internally, it will be represented by an std::map<int, V> (where V is the type of value stored). If an element is not present in the map, we will pretend that it is equal to the value Default from the template argument.
However, I'm having trouble overloading the subscript operator, []. I must overload the [] operator, because I'm passing objects from this class into a Boost function that expects [] to work correctly.
The const version is simple enough: check whether the index is in the map, return its value if so, or Default otherwise.
However, the non-const version requires me to return a reference, and that's where I run into trouble. If the value is only being read, I do not need (nor want) to add anything to the map; but if it's being written, I possibly need to put a new entry into the map. The problem is that the overloaded [] does not know whether a value is being read or written. It merely returns a reference.
Is there any way to solve this problem? Or perhaps to work around it?

There may be some very simple trick, but otherwise I think operator[] only has to return something which can be assigned from V (and converted to V), not necessarily a V&. So I think you need to return some object with an overloaded operator=(const V&), which creates the entry in your sparse container.
You will have to check what the Boost function does with its template parameter, though - a user-defined conversion to V affects what conversion chains are possible, for example by preventing there being any more user-defined conversions in the same chain.

Don't let the non-const operator& implementation return a reference, but a proxy object. You can then implement the assignment operator of the proxy object to distinguish read accesses to operator[] from write accesses.
Here's some code sketch to illustrate the idea. This approach is not pretty, but well - this is C++. C++ programmers don't waste time competing in beauty contests (they wouldn't stand a chance either). ;-)
template <typename V, V Default>
ProxyObject SparseVector::operator[]( int i ) {
// At this point, we don't know whether operator[] was called, so we return
// a proxy object and defer the decision until later
return ProxyObject<V, Default>( this, i );
}
template <typename V, V Default>
class ProxyObject {
ProxyObject( SparseVector<V, Default> *v, int idx );
ProxyObject<V, Default> &operator=( const V &v ) {
// If we get here, we know that operator[] was called to perform a write access,
// so we can insert an item in the vector if needed
}
operator V() {
// If we get here, we know that operator[] was called to perform a read access,
// so we can simply return the existing object
}
};

I wonder whether this design is sound.
If you want to return a reference, that means that clients of the class can store the result of calling operator[] in a reference, and read from/write to it at any later time. If you do not return a reference, and/or do not insert an element every time a specific index is addressed, how could they do this? (Also, I've got the feeling that the standard requires a proper STL container providing operator[] to have that operator return a reference, but I'm not sure of that.)
You might be able to circumvent that by giving your proxy also an operator V&() (which would create the entry and assign the default value), but I'm not sure this wouldn't just open another loop hole in some case I hadn't thought of yet.
std::map solves this problem by specifying that the non-const version of that operator always inserts an element (and not providing a const version at all).
Of course, you can always say this is not an off-the-shelf STL container, and operator[] does not return plain references users can store. And maybe that's OK. I just wonder.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Reversing an on-demand iterator - c++

Related

Input Iterator - Star and Postfix-Operator

C++ STL - Why treat a function that returns an iterator as a void function?

Is it wise to use a pointer to access values in an std::map

Weird behavior with map.find and a pointer to a vector

Overloading operator [] for a sparse vector

Categories

Resources