dynamic vector-like container but whose elements save their indexes? - c++

All the elements should have fixed position in the array after insertion until I explicitly delete them from there. Is there something like this in boost or wherever? Thanks

Use an unordered_map<int, T> or map<int, T>.
Or, use a vector<optional<T>>, and set the slot to delete to none_t, instead of actually deleting it.

Instead of "deleting" an element, you want to set its value to null (or some other "no value" equivalent). Then everything stays constant as you require.

Interesting. Is your goal is to expose a mapping from integers to SLOTS, where those SLOTS may contain a value? Or is your goal to preserve the underlying address of each element and the underlying address of the start of the internal array itself? Presumably you have a reason that you need either the position of the elements or the mapping from integer keys to elements to be preserved after an element is "removed". What is that reason?
The map<> or vector<> implementations mentioned above may not work because the remove, erase, find, etc. operations will remove, rearrange, or inspect the integers which you consider to be "removed".
Unfortunately, I think this may be a case where you need to roll your own using a wrapper around vector<optional<T> > or vector<T*>, depending upon how you define remove.

Related

How to use C++ std::sets as building blocks of a class?

I need a data structure that satisfies the following:
stores an arbitrary number of elements, where each element is described by 10 numeric metrics
allows fast (log n) search of elements by any of the metrics
allows fast (log n) insertion of new elements
allows fast (log n) removal of elements
And let's assume that the elements are expensive to construct.
I came up with the following plan
store all elements in a vector called DATA.
use 10 std::sets, one for each of 10 metrics. Each std:set is light-weight, it contains only integers, which are indexes into the vector DATA. The comparison operators 'look up' the appropriate element in DATA and then select the appropriate metric
template&lt int C &gt
struct Cmp
{
bool operator() (int const a, int const b)
{
return ( DATA[a].coords[C] != DATA[b].coords[C] )
? ( DATA[a].coords[C] &lt DATA[b].coords[C] )
: ( a &lt b );
}
};
Elements are never modified or removed from a vector. A new element is pushed back to DATA and then its index (DATA.size()-1) is inserted into the sets (set<int, Cmp<..> >). To remove an element, I set a flag in the element saying that it is deleted (without actually removing it from the DATA vector) and then remove the element index from all ten std::sets.
This works fine as long as DATA is a global variable. (It also somewhat abuses the type system by making the templated struct Cmp dependent on a global variable.)
However, I was not able to enclose the DATA vector and std::set's (set<int, Cmp<...> >) inside a class and then 'index' DATA with those std::sets. For starters, the comparison operator Cmp defined inside an outer class has no access to the outer class' fields (so it cannot assess DATA). I also cannot pass the vector to the Cmp constructor because Cmp is being constructed by std::set and std::set expects a comparison operator with a constructor that has no arguments.
I have a feeling I'm working against C++ type system and trying to achieve something that the type system is purposely preventing me from doing. (I'm trying to make std::set depend on a variable that is going to be constructed only at runtime.) And while I understand why the type system might not like what I do, I think this is a legitimate use case.
Is there a way to implement the data structure/class I described above without providing a re-implementation of std::set/red-black tree? I hope there may be a trick I have not thought of yet. (And yes, I know that boost has something, but I'd like to stick to the standard library.)
When I read something like "look up foo by a value bar", my initial reaction is to use a map<> or something similar. There are some implications to this though:
Keys in an std::map (or values in an std::set) are unique, so no two elements can share the same key and accordingly no two data objects would be able to have the same metric. If multiple data objects can have the same metric (this isn't clear from your question), using an std::multimap (or std::multiset) would work though.
If the keys are constant and stored within the elements themselves, using a set<data*,cmp> is a common approach. The comparator then just retrieves the according field from the objects and compares them. Lookup then requires creating a temporary object and using find() with it. Some implementations also have an extension that allows searching with a different type, which would make this much easier but also make porting require actual work.
It is important that the fields used as keys remain constant though, because if you modify them, you implicitly change the order of the set<>. This is the reason that a set<>'s elements are effectively constant, i.e. even a plain iterator has a constant as value type. If you store pointers though, you can easily get around that, because a constant pointer is something different than a pointer to a constant. Don't shoot yourself into the foot with that!
If the metrics are not so much a property of the objects themselves (or you don't mind redundantly storing them), using an std::map would be a natural choice. Storing the same object under multiple keys, depending on the metric, can be done in separate containers (map<int,data*> c[10];). However, you can do that in a single map using e.g. a pair<metric,value> as key (map<pair<int,int>,data*> c;).
Using a vector<> to store the actual elements and only referencing them as either pointers or indices in a map surely works. I'd take the pointers though, as this is what allows the above approaches using a set or map to work. Without that, the comparator would have to store a reference to the container, where at the moment it just uses the global DATA container. Getting this to work with a vector is tricky though, since it reallocates its elements when growing, as you correctly pointed out. I'd consider a different container type, like std::list or std::deque. The former would allow erasing elements, too, but it has a higher per-element overhead. The latter has a relatively low per-element overhead, only slightly above std::vector. You could then even go so far as to store iterators instead of pointers, which helps debugging provided you use a "checked STL" for that. Still, you will have to do some manual bookkeeping which object is still referenced somewhere and which one isn't.
Instead of using a separate container, you could also allocate the elements dynamically, although that itself has some overhead. If the overhead per element is not an issue, you could then use reference-counted smart pointers. If the application is a one-shot process, you could also use raw pointers and let the OS reclaim the memory on exit.
Note that I assume that storing multiple copies of the data objects is not an option. If that was the case, you could just as well have a map<int,data> m[10];, where each map stores its own copy of the data objects. All the bookkeeping issues would then be resolved, but at the price of a 10x overhead.

How to retrieve the elements from map in the order of insertion?

I have a map which stores <int, char *>. Now I want to retrieve the elements in the order they have been inserted. std::map returns the elements ordered by the key instead. Is it even possible?
If you are not concerned about the order based on the int (IOW you only need the order of insertion and you are not concerned with the normal key access), simply change it to a vector<pair<int, char*>>, which is by definition ordered by the insertion order (assuming you only insert at the end).
If you want to have two indices simultaneously, you need, well, Boost.MultiIndex or something similar. You'd probably need to keep a separate variable that would only count upwards (would be a steady counter), though, because you could use .size()+1 as new "insertion time key" only if you never erased anything from your map.
Now I want to retrieve the elements in the order they have been inserted. [...] Is it even possible?
No, not with std::map. std::map inserts the element pair into an already ordered tree structure (and after the insertion operation, the std::map has no way of knowing when each entry was added).
You can solve this in multiple ways:
use a std::vector<std::pair<int,char*>>. This will work, but not provide the automatic sorting that a map does.
use Boost stuff (#BartekBanachewicz suggested Boost.MultiIndex)
Use two containers and keep them synchronized: one with a sequential insert (e.g. std::vector) and one with indexing by key (e.g. std::map).
use/write a custom container yourself, so that supports both type of indexing (by key and insert order). Unless you have very specific requirements and use this a lot, you probably shouldn't need to do this.
A couple of options (other than those that Bartek suggested):
If you still want key-based access, you could use a map, along with a vector which contains all the keys, in insertion order. This gets inefficient if you want to delete elements later on, though.
You could build a linked list structure into your values: instead of the values being char*s, they're structs of a char* and the previously- and nextly-inserted keys**; a separate variable stores the head and tail of the list. You'll need to do the bookkeeping for that on your own, but it gives you efficient insertion and deletion. It's more or less what boost.multiindex will do.
** It would be nice to store map iterators instead, but this leads to circular definition problems.

C++ map allocator stores items in a vector?

Here is the problem I would like to solve: in C++, iterators for map, multimap, etc are missing two desirable features: (1) they can't be checked at run-time for validity, and (2) there is no operator< defined on them, which means that they can't be used as keys in another associative container. (I don't care whether the operator< has any relationship to key ordering; I just want there to be some < available at least for iterators to the same map.)
Here is a possible solution to this problem: convince map, multimap, etc to store their key/data pairs in a vector, and then have the iterators be a small struct that contain a pointer to the vector itself and a subscript index. Then two iterators, at least for the same container, could be compared (by comparing their subscript indices), and it would be possible to test at run time whether an iterator is valid.
Is this solution achievable in standard C++? In particular, could I define the 'Allocator' for the map class to actually put the items in a vector, and then define the Allocator::pointer type to be the small struct described in the last paragraph? How is the iterator for a map related to the Allocator::pointer type? Does the Allocator::pointer have to be an actual pointer, or can it be anything that supports a dereference operation?
UPDATE 2013-06-11: I'm not understanding the responses. If the (key,data) pairs are stored in a vector, then it is O(1) to obtain the items given the subscript, only slightly worse than if you had a direct pointer, so there is no change in the asymptotics. Why does a responder say map iterators are "not kept around"? The standard says that iterators remain valid as long as the item to which they refer is not deleted. As for the 'real problem': say I use a multimap for a symbol table (variable name->storage location; it is a multimap rather than map because the variables names in an inner scope may shadow variables with the same name), and say now I need a second data structure keyed by variables. The apparently easiest solution is to use as key for the second map an iterator to the specific instance of the variable's name in the first map, which would work if only iterators had an operator<.
I think not.
If you were somehow able to "convince" map to store its pairs in a vector, you would fundamentally change certain (at least two) guarantees on the map:
insert, erase and find would no longer be logarithmic in complexity.
insert would no longer be able to guarantee the validity of unaffected iterators, as the underlying vector would sometimes need to be reallocated.
Taking a step back though, two things suggest to me that you are trying to "solve" the wrong problem.
First, it is unusual to need to have a vector of iterators.
Second, it is unusual to need to check an iterator for validity, as iterators are not generally kept around.
I wonder what the real problem is that you are trying to solve?

Dynamic size of array in c++?

I am confused. I don't know what containers should I use. I tell you what I need first. Basically I need a container that can stored X number of Object (and the number of objects is unknown, it could be 1 - 50k).
I read a lot, over here array vs list its says: array need to be resized if the number of objects is unknown (I am not sure how to resize an array in C++), and it also stated that if using a linked list, if you want to search certain item, it will loop through (iterate) from first to end (or vice versa) while an array can specify "array object at index".
Then I went for an other solution, map, vector, etc. Like this one: array vs vector. Some responder says never use array.
I am new to C++, I only used array, vector, list and map before. Now, for my case, what kind of container you will recommend me to use? Let me rephrase my requirements:
Need to be a container
The number of objects stored is unknown but is huge (1 - 40k maybe)
I need to loop through the containers to find specific object
std::vector is what you need.
You have to consider 2 things when selecting a stl container.
Data you want to store
Operations you want to perform on the stored data
There wasa good diagram in a question here on SO, which depitcs this, I cannot find the link to it but I had it saved long time ago, here it is:
You cannot resize an array in C++, not sure where you got that one from. The container you need is std::vector.
The general rule is: use std::vector until it doesn't work, then shift to something that does. There are all sorts of theoretical rules about which one is better, depending on the operations, but I've regularly found that std::vector outperforms the others, even when the most frequent operations are things where std::vector is supposedly worse. Locality seems more important than most of the theoretical considerations on a modern machine.
The one reason you might shift from std::vector is because of iterator validity. Inserting into an std::vector may invalidate iterators; inserting into a std::list never.
Do you need to loop through the container, or you have a key or ID for your objects?
If you have a key or ID - you can use map to be able to quickly access the object by it, if the id is the simple index - then you can use vector.
Otherwise you can iterate through any container (they all have iterators) but list would be the best if you want to be memory efficient, and vector if you want to be performance oriented.
You can use vector. But if you need to find objects in the container, then consider using set, multiset or map.

set vs vector with custom iterator

I understand this question may be quickly flagged as a duplicate of many other more popular questions, but I'll still ask it:
I need a container that provides duplicate checking on insert (like std::set, but allows me to modify elements already present (like std::vector). It should also be relatively fast to search for elements (which would prefer std::set again). Would it be better to use a vector and perhaps a custom duplicate-checking insert_iterator instead of modifying set elements by erasing and reinserting them?
Thanks
What is to stop you from using a std::set? If you need to modify an element, copy it, erase it, then re-insert.
Have you looked into using a map?
Reference
A map may be a good solution to your problem.
If your strings are long and performance is critical then you may be stuck with a custom container that wraps something like a parallel vector<string> and set<string *>. Provide a custom comparator for the set so that it dereferences through the pointer to make the comparisons. To modify an element, remove the pointer from the set, modify the string, then reinsert the pointer.
This get a bit messy when you want to remove container elements, so you would want to use some form of lazy deletion. At that point you are very close to a full-blown free-object pool for your strings.
If you're using a vector of strings in performance-critical code, then watch out for the vector reallocations which would manually copy each string into the new memory chunk. You can bypass that by watching for an upcoming reallocation, creating a new vector of empty strings (pre-reserved to double size), and then using string::swap on each element to move the old data into the new larger vector.
Things will be much nicer when c++0x move semantics are widely available.