Boost.Intrusive can get an Iterator out of an Object-Ref or Object-Pointer in constant time (see here: https://www.boost.org/doc/libs/1_72_0/doc/html/intrusive/usage_when.html). How does that work? Why is this not possible for standard containers?
Intrusive containers by definition have the information contained inside of elements to know how they're located in the container. A simple example is an intrusive linked list:
struct Object {
Object* next;
int some_data;
};
Obviously, if I have a reference or pointer to Object, I can easily find the next field, and from there, move to the next element, this is just accessing a member, which is O(1), thus constant time iterators.
With non-intrusive containers, it would look like this:
struct Object {
int some_data;
};
Suppose I have a std::vector of these, and a pointer or reference to Object, I can't work backwards from that to where it is in the std::vector without scanning the container to find it (a O(n) operation).
Related
I ran into some existing code that I inherited that looks up and stores references of structs that are stored in a tbb::concurrent_unordered_map. I know if it was an iterator it would be safe but a reference to the actual object seems fishy.
The code constantly inserts new items into the tbb::concurrent_unordered_map. Can an insert not change the physical location of the items contained inside the tbb map which would make the stored references point to the wrong place as it would for an std::map?
The tbb documentation states:
"Like std::list, insertion of new items does not invalidate any
iterators, nor change the order of items already in the map. Insertion
and traversal may be concurrent."
I know for an std::list the location would not change when inserting new items but because the concurrent_unordered_map documentation talks about the order only I am worried that that does not explicitly say that it can move location.
Below is some pseudo code that demonstrates the concept of the code I ran into.
struct MyStruct {
int i;
int j;
};
//some other thread will insert items into myMap
tbb::concurrent_unordered_map <int, MyStruct> myMap;
MyStruct& getMyStruct (int id)
{
auto itr=myMap.find (id);
if (itr!=myMap.end ()) return itr->second;
static MyStruct dummy {1,2};
return dummy;
}
class MyClass {
public:
MyClass (int id)
:m_myStruct {getMyStruct (id)}
{
}
void DoSomething () {
std::cout<<m_myStruct.i<<std::endl;
}
protected:
MyStruct& m_myStruct; //reference to an item held into a tbb::concurrent_unordered_map
};
This code has been running for over a year at a relatively high frequency which seems to suggest that it is safe but I would like to know for sure that it is safe to keep hold of a reference to an item contained in a tbb::concurrent_unordered_map.
Rewriting the code is a lot of work so I would prefer to not have to do it if it is ok to leave it as is.
Thanks,
Paul
If you are worried so much about going into gray area of not completely documented behavior, why don't you store and use iterators instead? One problem is its size though (run it):
#include <tbb/tbb.h>
int main() {
tbb::concurrent_unordered_map<int, int> m;
printf("Size of iterator: %lu\n", sizeof(m.begin()));
}
This is because besides the pointer to the element, iterator has the pointer to the container it belongs to. There is no list of pointers or any other structure, which registers all the iterators users create, thus the container cannot update an interator to point to a different address (it would be especially tricky in a parallel program). Anyway, such a relocation would cause additional user visible calls to moving/copy constructors of key/value types and thus would be documented. There is no such a warning in the documentation and adding it would cause backward incompatible change, which gives you a sort of indirect guarantee that you can safely keep pointers to your items instead of using iterators.
I can't seem to find much information about whether iterators keep hold of the underlying object they are iterating over.
If I create an iterator, then the object that supplied it goes out of scope, does the presence of the iterator prevent it from being destroyed?
Here is a very simple example just to illustrate the scenario:
// This class takes a copy of iterators to use them later
class Data {
public:
Data(std::vector<int>::iterator start, std::vector<int>::iterator end)
: start(start),
end(end)
{}
void show() {
// Use this->start and this->end for some purpose
}
private:
std::vector<int>::iterator start;
std::vector<int>::iterator end;
};
Data test() {
std::vector<int> v{1, 2, 3};
Data d(v.begin(), v.end());
d.show(); // this would be ok
return d;
}
int main(void) {
Data d = test();
d.show(); // What happens here?
}
In this example, the Data object is storing a copy of the iterators, which is fine for the first show() call. However by the time of the second show() call, the original object that supplied the iterators no longer exists.
Do the iterators keep the object around until such time as they are all themselves destroyed, or are the iterators invalidated as soon as the original object goes out of scope?
Here is one reference of many which doesn't say what happens one way or the other (or even whether the result of this is 'undefined'.)
Iterators typically don't own the data over which they iterate, no. In fact, they're rarely (if ever) even aware of the object that owns the data; vector iterators, for example, are often just pointers, which have no knowledge of any vector or of its lifetime. Even those iterators that are not implemented as pointers (which is most of them) may be considered a kind of "pointer", and treated as such: they can quite easily become dangling.
Your example has UB because you'll dereference invalid iterators inside show() the second time.
If your container goes out of scope then all your iterators become invalidated. In fact, there are all manner of reasons why an iterator may become invalidated, such as adding to a vector when that operation results in a capacity expansion.
It's possible to find iterators that do kind of "own" data, instead of iterating over some collection found elsewhere (such as Boost's counting iterators), but these are magical properties that take advantage of C++ to provide a magical function, not an inherent property of iterators as defined by C++.
An iterator is generally only valid as long as its originating container or "sequence" has not been changed, because a change might cause memory reallocation and memory moves. Since the iterator usually reference memory in the originating container, a change in said container might invalidate the iterator.
Now, a container that goes out of scope gets its destructor executed. That will obviously change the container and hence any iterator to it will be invalidated in the process.
First, iterator does not have an interface to reference an object it iterates over. It only implements pointer semantics, so you may think of it as of abstract pointer. Of course, it's internal implementation may hold a pointer to that object, but it's very unlikely in real-world implementations.
Second, when your container is destroyed (and it is when it goes out of scope), all objects in the container are being destroyed too. Thus, iterator becomes invalid after you container was destroyed. After that incrementing, decrementing and dereferencing the iterator will cause undefined behavior.
I'm in need of a container that has the properties of both a vector and a list. I need fast random access to elements within the container, but I also need to be able to remove elements in the middle of the container without moving the other elements. I also need to be able to iterate over all elements in the container, and see at a glance (without iteration) how many elements are in the container.
After some thought, I've figured out how I could create such a container, using a vector as the base container, and wrapping the actual stored data within a struct that also contained fields to record whether the element was valid, and pointers to the next/previous valid element in the vector. Combined with some overloading and such, it sounds like it should be fairly transparent and fulfill my requirements.
But before I actually work on creating yet another container, I'm curious if anyone knows of an existing library that implements this very thing? I'd rather use something that works than spend time debugging a custom implementation. I've looked through the Boost library (which I'm already using), but haven't found this in there.
If the order does not matter, I would just use a hash table mapping integers to pointers. std::tr1::unordered_map<int, T *> (or std::unordered_map<int, unique_ptr<T>> if C++0x is OK).
The hash table's elements can move around which is why you need to use a pointer, but it will support very fast insertion / lookup / deletion. Iteration is fast too, but the elements will come out in an indeterminate order.
Alternatively, I think you can implement your own idea as a very simple combination of a std::vector and a std::list. Just maintain both a list<T> my_list and a vector<list<T>::iterator> my_vector. To add an object, push it onto the back of my_list and then push its iterator onto my_vector. (Set an iterator to my_list.end() and decrement it to get the iterator for the last element.) To lookup, look up in the vector and just dereference the iterator. To delete, remove from the list (which you can do by iterator) and set the location in the vector to my_list.end().
std::list guarantees the elements within will not move when you delete them.
[update]
I am feeling motivated. First pass at an implementation:
#include <vector>
#include <list>
template <typename T>
class NairouList {
public:
typedef std::list<T> list_t;
typedef typename list_t::iterator iterator;
typedef std::vector<iterator> vector_t;
NairouList() : my_size(0)
{ }
void push_back(const T &elt) {
my_list.push_back(elt);
iterator i = my_list.end();
--i;
my_vector.push_back(i);
++my_size;
}
T &operator[](typename vector_t::size_type n) {
if (my_vector[n] == my_list.end())
throw "Dave's not here, man";
return *(my_vector[n]);
}
void remove(typename vector_t::size_type n) {
my_list.erase(my_vector[n]);
my_vector[n] = my_list.end();
--my_size;
}
size_t size() const {
return my_size;
}
iterator begin() {
return my_list.begin();
}
iterator end() {
return my_list.end();
}
private:
list_t my_list;
vector_t my_vector;
size_t my_size;
};
It is missing some Quality of Implementation touches... Like, you probably want more error checking (what if I delete the same element twice?) and maybe some const versions of operator[], begin(), end(). But it's a start.
That said, for "a few thousand" elements a map will likely serve at least as well. A good rule of thumb is "Never optimize anything until your profiler tells you to".
Looks like you might be wanting a std::deque. Removing an element is not as efficient as a std::list, but because deque's are typically created by using non-contiguous memory "blocks" that are managed via an additional pointer array/vector internal to the container (each "block" would be an array of N elements), removal of an element inside of a deque does not cause the same re-shuffling operation that you would see with a vector.
Edit: On second though, and after reviewing some of the comments, while I think a std::deque could work, I think a std::map or std::unordered_map will actually be better for you since it will allow the array-syntax indexing you want, yet give you fast removal of elements as well.
Been a while since I've used C++. Can I do something like this?:
for (vector<Node>::iterator n = active.begin(); n!=active.end(); ++n) {
n->ax /= n->m;
}
where Node is an object with a few floats in it?
If written in Java, what I'm trying to accomplish is something similar to:
for (Node n : this.active) {
n.ax /= n.m;
}
where active is an arrayList of Node objects.
I think I am forgetting some quirk about passing by reference or something throws hands in the air in desperation
Yes. This syntax basically works for almost all STL containers.
// this will walk it the container from the beginning to the end.
for(container::iterator it = object.begin(); it != object.end(); it++)
{
}
object.begin() - basically gives an iterator the first element of the container.
object.end() - the iterator is set to this value once it has gone through all elements. Note that to check the end we used !=.
operator ++ - Move the iterator to the next element.
Based on the type of container you may have other ways to navigate the iterator (say backwards, randomly to a spot in the container, etc). A good introduction to iterators is here.
Short answer: yes, you can.
The iterator is a proxy for the container element. In some cases the iterator is literally just a pointer to the element.
Your code works fine for me
#include <vector>
using std::vector;
struct Node{
double ax;
double m;
};
int main()
{
vector<Node> active;
for (vector<Node>::iterator n = active.begin(); n!=active.end(); ++n) {
n->ax /= n->m;
}
}
You can safely change an object contained in a container without invalidating iterators (with the associative containers, this applies only to the 'value' part of the element, not the 'key' part).
However, what you might be thinking of is that if you change the container (say by deleting or moving the element), then existing iterators might be invalidated, depending on the container, the operation being performed, and the details of the iterators involved (which is why you aren't allowed to change the 'key' of an object in an associative container - that would necessitate moving the object in the container in the general case).
In the case of std::vector, yes, you can manipulate the object simply by dereferencing the iterator. In the case of sorted containers, such as a std::set, you can't do that. Since a set inserts its elements in order, directly manipulating the contents isn't permitted since it might mess up the ordering.
What you have will work. You can also use one of the many STL algorithms to accomplish the same thing.
std::for_each(active.begin(), active.end(), [](Node &n){ n.ax /= n.m; });
This is a pretty straightforward architectural question, however it's been niggling at me for ages.
The whole point of using a list, for me anyway, is that it's O(1) insert/remove.
The only way to have an O(1) removal is to have an iterator for erase().
The only way to get an iterator is to keep hold of it from the initial insert() or to find it by iteration.
So, what to pass around; an Iterator or a pointer?
It would seem that if it's important to have fast removal, such as some sort of large list which is changing very frequently, you should pass around an iterator, and if you're not worried about the time to find the item in the list, then pass around the pointer.
Here is a typical cut-down example:
In this example we have some type called Foo. Foo is likely to be a base class pointer, but it's not here for simplicity.
Then we have FooManger, which holds a list of shared_ptr, FooPtr . The manager is responsible for the lifetime of the object once it's been passed to it.
Now, what to return from addFoo()?
If I return a FooPtr then I can never remove it from the list in O(1), because I will have to find it in the list.
If I return a std::list::iterator, FooPtrListIterator, then anywhere I need to remove the FooPtr I can, just by dereferencing the iterator.
In this example I have a contrived example of a Foo which can kill itself under some circumstance, Foo::killWhenConditionMet().
Imagine some Foo that has a timer which is ticking down to 0, at which point it needs to ask the manager to delete itself. The trouble is that 'this' is a naked Foo*, so the only way to delete itself, is to call FooManager::eraseFoo() with a raw pointer. Now the manager has to search for the object pointer to get an iterator so it can be erased from the list, and destroyed.
The only way around that is to store the iterator in the object. i.e Foo has a FooPtrListIterator as a member variable.
struct Foo;
typedef boost::shared_ptr<Foo> FooPtr;
typedef std::list<FooPtr> FooPtrList;
typedef FooPtrList::iterator FooPtrListIterator;
struct FooManager
{
FooPtrList l;
FooPtrListIterator addFoo(Foo *foo) {
return l.insert(l.begin(), FooPtr(foo));
}
void eraseFoo(FooPtrListIterator foo) {
l.erase(foo);
}
void eraseFoo(Foo *foo) {
for (FooPtrListIterator it=l.begin(), ite=l.end(); it!=ite; ++it) {
if ((*it).get()==foo){
eraseFoo(it);
return;
}
}
assert("foo not found!");
}
};
FooManager g_fm;
struct Foo
{
int _v;
Foo(int v):_v(v) {
}
~Foo() {
printf("~Foo %d\n", _v);
}
void print() {
printf("%d\n", _v);
}
void killWhenConditionMet() {
// Do something that will eventually kill this object, like a timer
g_fm.eraseFoo(this);
}
};
void printList(FooPtrList &l)
{
printf("-\n");
for (FooPtrListIterator it=l.begin(), ite=l.end(); it!=ite; ++it) {
(*it)->print();
}
}
void test2()
{
FooPtrListIterator it1=g_fm.addFoo(new Foo(1));
printList(g_fm.l);
FooPtrListIterator it2=g_fm.addFoo(new Foo(2));
printList(g_fm.l);
FooPtrListIterator it3=g_fm.addFoo(new Foo(3));
printList(g_fm.l);
(*it2)->killWhenConditionMet();
printList(g_fm.l);
}
So, the questions I have are:
1. If an object needs to delete itself, or have some other system delete it, in O(1), do I have to store an iterator to object, inside the object? If so, are there any gotchas to do with iterators becoming invalid due other container iterations?
Is there simply another way to do this?
As a side question, does anyone know why and of the 'push*' stl container operations don't return the resultant iterator, meaning one has to resort to 'insert*'.
Please, no answers that say "don't pre-optimise", it drives me nuts. ;) This is an architectural question.
C++ standard in its [list.modifiers] section says that any list insertion operation "does not affect the validity of iterators and references", and any removal operation "invalidates only the iterators and references to the erased elements". So keeping iterators around would be safe.
Keeping iterators inside the objects also seems sane. Especially if you don't call them iterators, but rather name like FooManagerHandlers, which are processed by removal function in an opaque way. Indeed, you do not store "iterators", you store "representatives" of objects in an organized structure. These representatives are used to define a position of an object inside that structure. This is a separate, quite a high-level concept, and there's nothing illogical in implementing it.
However, the point of using lists is not just O(1) insert/remove, but also keeping elements in an order. If you don't need any order, then you would probably find hash tables more useful.
The one problem I see with storing the iterator in the object is that you must be careful of deleting the object from some other iterator, as your objects destructor does not know where it was destroyed from, so you can end up with an invalid iterator in the destructor.
The reason that push* does not return an iterator is that it is the inverse of pop*, allowing you to treat your container as a stack, queue, or deque.