Is there anything built into the C++ Standard Library that allows me to work in a priority queue/heap like data structure (i.e., can always pop the highest value from the list, can define how the highest value is determined for custom classes, etc.) but allows me to update the keys in the heap? I'm dealing with fairly simple data, pairs to be exact, but I need to be able to update the value of a given key within the heap easily for my algorithm to function. WHat is the best way to achieve this in C++?
Binary heaps (which are how priority queues are implemented in the C++ standard library) do not support arbitrary update-key operations. A common method if updates are infrequent is to extrinsically flag the original item as invalid, and reinsert the value with the new key; when an invalid value is popped, it is ignored.
The alternative is using a different PQ implementation which does support update-key, such as a binomial heap. Binomial heaps have the particular advantage of being manipulated by swinging pointers, instead of moving values. This streamlines the task of implementing operations like update-key and delete.
I'm not sure what you're take on Boost is, but I always consider a kind of almost standard library (some boost functionality has even ended up in standard library). In any case, if you're ok with using boost, then Boost.Heap provides a priority queue with updatable priority.
Like most boost libraries, it's header-only, so there's no linker hassles to go through and it won't make your build system any more complex. You can just #include it and use it.
I don't have the ability to comment on your question, but here is my understanding.
Your mentioned pairs, and it sounds like you need the ability to change the priority from some function of your first element of the pair to your second element.
That is, you want to initially use FirstComparator below, but then switch to SecondComparator.
typedef std::pair<X, Y> MyPair;
struct FirstComparator
{
bool operator() (const MyPair& left, const MyPair& right)
{
return left.first < right.first;
}
}
struct SecondComparator
{
bool operator() (const MyPair& left, const MyPair& right)
{
return left.second < right.second;
}
}
Because std::priority_queue is a template that includes a sorting criterion (as your question mentioned), you can create a second container of a different type. (I wrote less-than comparisons, so we have a min-heap.)
You will need to transfer the members into it.
std::priority_queue<MyPair, std::vector<MyPair>, FirstComparator> firstQueue;
// Populate and use firstQueue
// Now create an initially-empty queue sorting according to other criterion,
// and copy elements in.
std::priority_queue<MyPair, std::vector<MyPair>, SecondComparator> secondQueue;
// Use 'push' instead of 'emplace' for pre-C++11 code
while (! firstQueue.empty())
secondQueue.emplace(firstQueue.pop());
// Use secondQueue
An alternative approach is to use a single std::vector, and resort it with std::sort using different sorting criteria. C++11 allows you to create named or anonymous lambda functions for establishing such sorting criteria on-the-fly.
Since your question specifically involved priority queues, I won't get into that unless you're specifically interested.
Related
I have a container which (among other things) exposes a string buffer, and the upper case version of that string buffer. (Well, it isn't just upper case, but it is similar in concept) I want to allow a caller to do something similar to:
container c("Example");
auto const iter = c.begin() + 2;
std::printf("%c\n", iter->get_source()); // Prints a
std::printf("%c\n", iter->get_upper()); // Prints A
iter->set('x');
std::puts(c.get()); // Prints Exxmple
std::puts(c.get_upper()); // Prints EXXMPLE
The problem is, the "proxy" type with the member functions get_source, get_upper, etc. has no obvious place it can be stored, and an iterator is required to return a reference to something, not a value. (vector<bool> has a similar problem)
Alternately I could expose some kind of shell container or range, or expose completely separate iterator begin/end functions. Does anyone have experience doing something like this and know what works well?
My personal approach to this sort of things is to use property maps: I envision a system of algorithms which can [optionally] take a property map (or actually sometimes multiple property maps) for each range. The idea is that *it yields a key (e.g., the T& it currently do) which is then used with a property map which transforms the key into the actually accessed value. The transformation can, e.g., be the identity yielding the current behavior of the algorithms and a good default to be used when there is no property map. The example above would look something like this:
auto const cursor = c.begin();
std::printf("%c\n", c.map_source()(*cursor));
std::printf("%c\n", c.map_upper()(*cursor));
c.map_source()(*cursor, 'x');
std::copy(c.map_source(), c, std::ostreambuf_iterator<char>(std::cout));
std::copy(c.map_upper(), c, std::ostreambuf_iterator<char>(std::cout));
std::copy([](unsigned char c)->char{ return std::toupper(c); }, c,
std::ostreambuf_iterator<char>(std::cout));
The code assumes that the property maps yielding the source and the capitalized characters are obtained using c.map_source() and c.map_upper(), respectively. The last variant using std::copy() uses a lambda function as a property map.
Sadly, I still haven't found the time to write up a coherent proposal to apply various improvements to the STL algorithms. ... nor do I have have an implementation putting it all together (I have a somewhat clunky implementation which is about 10 years old and doesn't benefit from various C++11 features which make it a lot easier; also, this implementation only concentrates on property maps and doesn't use the interface I currently envision).
I am no expert in C++ and STL.
I use a structure in a Map as data. Key is some class C1.
I would like to access the same data but using a different key C2 too (where C1 and C2 are two unrelated classes).
Is this possible without duplicating the data?
I tried searching in google, but had a tough time finding an answer that I could understand.
This is for an embedded target where boost libraries are not supported.
Can somebody offer help?
You may store pointers to Data as std::map values, and you can have two maps with different keys pointing to the same data.
I think a smart pointer like std::shared_ptr is a good option in this case of shared ownership of data:
#include <map> // for std::map
#include <memory> // for std::shared_ptr
....
std::map<C1, std::shared_ptr<Data>> map1;
std::map<C2, std::shared_ptr<Data>> map2;
Instances of Data can be allocated using std::make_shared().
Not in the Standard Library, but Boost offers boost::multi_index
Two keys of different types
I must admit I've misread a bit, and didn't really notice you want 2 keys of different types, not values. The solution for that will base on what's below, though. Other answers have pretty much what will be needed for that, I'd just add that you could make an universal lookup function: (C++14-ish pseudocode).
template<class Key>
auto lookup (Key const& key) { }
And specialize it for your keys (arguably easier than SFINAE)
template<>
auto lookup<KeyA> (KeyA const& key) { return map_of_keys_a[key]; }
And the same for KeyB.
If you wanted to encapsulate it in a class, an obvious choice would be to change lookup to operator[].
Key of the same type, but different value
Idea 1
The simplest solution I can think of in 60 seconds: (simplest meaning exactly that it should be really thought through). I'd also switch to unordered_map as default.
map<Key, Data> data;
map<Key2, Key> keys;
Access via data[keys["multikey"]].
This will obviously waste some space (duplicating objects of Key type), but I am assuming they are much smaller than the Data type.
Idea 2
Another solution would be to use pointers; then the only cost of duplicate is a (smart) pointer:
map<Key, shared_ptr<Data>> data;
Object of Data will be alive as long as there is at least one key pointing to it.
What I usually do in these cases is use non-owned pointers. I store my data in a vector:
std::vector<Data> myData;
And then I map pointers to each element. Since it is possible that pointers are invalidated because of the future growth of the vector, though, I will choose to use the vector indexes in this case.
std::map<Key1, int> myMap1;
std::map<Key2, int> myMap2;
Don't expose the data containers to your clients. Encapsulate element insertion and removal in specific functions, which insert everywhere and remove everywhere.
Bartek's "Idea 1" is good (though there's no compelling reason to prefer unordered_map to map).
Alternatively, you could have a std::map<C2, Data*>, or std::map<C2, std::map<C1, Data>::iterator> to allow direct access to Data objects after one C2-keyed search, but then you'd need to be more careful not to access invalid (erased) Data (or more precisely, to erase from both containers atomically from the perspective of any other users).
It's also possible for one or both maps to move to shared_ptr<Data> - the other could use weak_ptr<> if that's helpful ownership-wise. (These are in the C++11 Standard, otherwise the obvious source - boost - is apparently out for you, but maybe you've implemented your own or selected another library? Pretty fundamental classes for modern C++).
EDIT - hash tables versus balanced binary trees
This isn't particularly relevant to the question, but has received comments/interest below and I need more space to address it properly. Some points:
1) Bartek's casually advising to change from map to unordered_map without recommending an impact study re iterator/pointer invalidation is dangerous, and unwarranted given there's no reason to think it's needed (the question doesn't mention performance) and no recommendation to profile.
3) Relatively few data structures in a program are important to performance-critical behaviours, and there are plenty of times when the relative performance of one versus another is of insignificant interest. Supporting this claim - masses of code were written with std::map to ensure portability before C++11, and perform just fine.
4) When performance is a serious concern, the advice should be "Care => profile", but saying that a rule of thumb is ok - in line with "Don't pessimise prematurely" (see e.g. Sutter and Alexandrescu's C++ Coding Standards) - and if asked for one here I'd happily recommend unordered_map by default - but that's not particularly reliable. That's a world away from recommending every std::map usage I see be changed.
5) This container performance side-track has started to pull in ad-hoc snippets of useful insight, but is far from being comprehensive or balanced. This question is not a sane venue for such a discussion. If there's another question addressing this where it makes sense to continue this discussion and someone asks me to chip in, I'll do it sometime over the next month or two.
You could consider having a plain std::list holding all your data, and then various std::map objects mapping arbitrary key values to iterators pointing into the list:
std::list<Data> values;
std::map<C1, std::list<Data>::iterator> byC1;
std::map<C2, std::list<Data>::iterator> byC2;
I.e. instead of fiddling with more-or-less-raw pointers, you use plain iterators. And iterators into a std::list have very good invalidation guarantees.
I had the same problem, at first holding two map for shared pointers sound very cool. But you will still need to manage this two maps(inserting, removing etc...).
Than I came up with other way of doing this.
My reason was; accessing a data with x-y or radius-angle. Think like each point will hold data but point could be described as cartesian x,y or radius-angle .
So I wrote a struct like
struct MyPoint
{
std::pair<int, int> cartesianPoint;
std::pair<int, int> radianPoint;
bool operator== (const MyPoint& rhs)
{
if (cartesianPoint == rhs.cartesianPoint || radianPoint == rhs.radianPoint)
return true;
return false;
}
}
After that I could used that as key,
std::unordered_map<MyPoint, DataType> myMultIndexMap;
I am not sure if your case is the same or adjustable to this scenerio but it can be a option.
I have a set of data which in some occasion I need to sort them in one way and some occasion in another way. For example, suppose the data set is a set of strings,{"abc", "dfg",...}. Sometimes I need to sort them in alphabetic order and sometimes by comparing their length.
Initially I used std::set as a container of my data and implemented 2 comparators, hoping that I can change the comparator of the set on the fly, cause the data is huge and it's not a good idea to copy it from one set to another..I just want to sort it using different comparators from time to time. Is this possible or what's the right way to do it?
You have to specify the comparator of std::set at construction time.
As a solution, I would maintain two 'index' sets instead, each referring to the actual collection. This would yield the greatest flexibility. To keep everything together, I suggest you wrap it up in a single class:
// to be compiled, debugged etc..., but ideal
// to grab the idea
// caveats: maintain the index objects whenever the collection
// gets resized/reallocated etc...
// so not to be written yourself, use an existing library :)
template< typename T, typename comp1, typename comp2 >
struct MultiIndex {
std::deque<T> collection;
std::set<T*, comp1> index1;
std::set<T*, comp2> index2;
void insert( const T& t ){
collection.push_back(t);
index1.insert( &collection.back() );
index2.insert( &collection.back() );
}
};
Boost library has such a class: Multiindex.
The set is internally always kept sorted (otherwise you wouldn't have the needed performance), so no, the comparator can't be changed. What I think the best solution here is to maintain two sets with same data, but with different comparators. I'd encapsulate the two sets in a class and have the functions like insertion work on both sets to ensure the data is the same on both sets.
If you only don't need to have the data sorted all the time, another way to accomplish what you want would be to simply use e.g. a vector and sort by whichever comparator you need when necessary.
No, not on the fly. The tree is built based on the sort criteria specified at construction time. You are talking about building multiple indexes into a single dataset, which could be accomplished with multiple sets. There are probably lots of libs like boost which have something already created for this.
I've only recently started dwelling into boost and it's containers, and I read a few articles on the web and on stackoverflow that a boost::unordered_map is the fastest performing container for big collections.
So, I have this class State, which must be unique in the container (no duplicates) and there will be millions if not billions of states in the container.
Therefore I have been trying to optimize it for small size and as few computations as possible. I was using a boost::ptr_vector before, but as I read on stackoverflow a vector is only good as long as there are not that many objects in it.
In my case, the State descibes sensorimotor information from a robot, so there can be an enormous amount of states, and therefore fast lookup is of topemost priority.
Following the boost documentation for unordered_map I realize that there are two things I could do to speed things up: use a hash_function, and use an equality operator to compare States based on their hash_function.
So, I implemented a private hash() function which takes in State information and using boost::hash_combine, creates an std::size_t hash value.
The operator== compares basically the state's hash values.
So:
is std::size_t enough to cover billions of possible hash_function
combinations ? In order to avoid duplicate states I intend to use
their hash_values.
When creating a state_map, should I use as key the State* or the hash
value ?
i.e: boost::unordered_map<State*,std::size_t> state_map;
Or
boost::unordered_map<std::size_t,State*> state_map;
Are the lookup times with a boost::unordered_map::iterator =
state_map.find() faster than going through a boost::ptr_vector and
comparing each iterator's key value ?
Finally, any tips or tricks on how to optimize such an unordered map
for speed and fast lookups would be greatly appreciated.
EDIT: I have seen quite a few answers, one being not to use boost but C++0X, another not to use an unordered_set, but to be honest, I still want to see how boost::unordered_set is used with a hash function.
I have followed boost's documentation and implemented, but I still cannot figure out how to use the hash function of boost with the ordered set.
This is a bit muddled.
What you say are not "things that you can do to speed things up"; rather, they are mandatory requirements of your type to be eligible as the element type of an unordered map, and also for an unordered set (which you might rather want).
You need to provide an equality operator that compares objects, not hash values. The whole point of the equality is to distinguish elements with the same hash.
size_t is an unsigned integral type, 32 bits on x86 and 64 bits on x64. Since you want "billions of elements", which means many gigabytes of data, I assume you have a solid x64 machine anyway.
What's crucial is that your hash function is good, i.e. has few collisions.
You want a set, not a map. Put the objects directly in the set: std::unordered_set<State>. Use a map if you are mapping to something, i.e. states to something else. Oh, use C++0x, not boost, if you can.
Using hash_combine is good.
Baby example:
struct State
{
inline bool operator==(const State &) const;
/* Stuff */
};
namespace std
{
template <> struct hash<State>
{
inline std::size_t operator()(const State & s) const
{
/* your hash algorithm here */
}
};
}
std::size_t Foo(const State & s) { /* some code */ }
int main()
{
std::unordered_set<State> states; // no extra data needed
std::unordered_set<State, Foo> states; // another hash function
}
An unordered_map is a hashtable. You don't store the hash; it is done internally as the storage and lookup method.
Given your requirements, an unordered_set might be more appropriate, since your object is the only item to store.
You are a little confused though -- the equality operator and hash function are not truly performance items, but required for nontrivial objects for the container to work correctly. A good hash function will distribute your nodes evenly across the buckets, and the equality operator will be used to remove any ambiguity about matches based on the hash function.
std::size_t is fine for the hash function. Remember that no hash is perfect; there will be collisions, and these collision items are stored in a linked list at that bucket position.
Thus, .find() will be O(1) in the optimal case and very close to O(1) in the average case (and O(N) in the worst case, but a decent hash function will avoid that.)
You don't mention your platform or architecture; at billions of entries you still might have to worry about out-of-memory situations depending on those and the size of your State object.
forget about hash; there is nothing (at least from your question) that suggests you have a meaningful key;
lets take a step back and rephrase your actual performance goals:
you want to quickly validate no duplicates ever exist for any of your State objects
comment if i need to add others.
From the aforementioned goal, and from your comment i would suggest you use actually a ordered_set rather than an unordered_map. Yes, the ordered search uses binary search O(log (n)) while unordered uses lookup O(1).
However, the difference is that with this approach you need the ordered_set ONLY to check that a similar state doesn't exist already when you are about to create a new one, that is, at State creation-time.
In all the other lookups, you actually don't need to look into the ordered_set! because you already have the key; State*, and the key can access the value by the magic dereference operator: *key
so with this approach, you only are using the ordered_set as an index to verify States on creation time only. In all the other cases, you access your State with the dereference operator of your pointer-value key.
if all the above wasn't enough to convince you, here is the final nail in the coffin of the idea of using a hash to quickly determine equality; hash function has a small probability of collision, but as the number of states will grow, that probability will become complete certainty. So depending on your fault-tolerance, you are going to deal with state collisions (and from your question and the number of States you are expecting to deal, it seems you will deal with a lot of them)
For this to work, you obviously need the compare predicate to test for all the internal properties of your state (giroscope, thrusters, accelerometers, proton rays, etc.)
Python's itertools implement a chain iterator which essentially concatenates a number of different iterators to provide everything from single iterator.
Is there something similar in C++ ? A quick look at the boost libraries didn't reveal something similar, which is quite surprising to me. Is it difficult to implement this functionality?
Came across this question while investigating for a similar problem.
Even if the question is old, now in the time of C++ 11 and boost 1.54 it is pretty easy to do using the Boost.Range library. It features a join-function, which can join two ranges into a single one. Here you might incur performance penalties, as the lowest common range concept (i.e. Single Pass Range or Forward Range etc.) is used as new range's category and during the iteration the iterator might be checked if it needs to jump over to the new range, but your code can be easily written like:
#include <boost/range/join.hpp>
#include <iostream>
#include <vector>
#include <deque>
int main()
{
std::deque<int> deq = {0,1,2,3,4};
std::vector<int> vec = {5,6,7,8,9};
for(auto i : boost::join(deq,vec))
std::cout << "i is: " << i << std::endl;
return 0;
}
In C++, an iterator usually doesn't makes sense outside of a context of the begin and end of a range. The iterator itself doesn't know where the start and the end are. So in order to do something like this, you instead need to chain together ranges of iterators - range is a (start, end) pair of iterators.
Takes a look at the boost::range documentation. It may provide tools for constructing a chain of ranges. The one difference is that they will have to be the same type and return the same type of iterator. It may further be possible to make this further generic to chain together different types of ranges with something like any_iterator, but maybe not.
I've written one before (actually, just to chain two pairs of iterators together). It's not that hard, especially if you use boost's iterator_facade.
Making an input iterator (which is effectively what Python's chain does) is an easy first step. Finding the correct category for an iterator chaining a combination of different iterator categories is left as an exercise for the reader ;-).
Check Views Template Library (VTL). It may not provided 'chained iterator' directly. But I think it has all the necessary tools/templates available for implementing your own 'chained iterator'.
From the VTL Page:
A view is a container adaptor, that provides a container interface to
parts of the data or
a rearrangement of the data or
transformed data or
a suitable combination of the data sets
of the underlying container(s). Since views themselves provide the container interface, they can be easily combined and stacked. Because of template trickery, views can adapt their interface to the underlying container(s). More sophisticated template trickery makes this powerful feature easy to use.
Compared with smart iterators, views are just smart iterator factories.
What you are essentially looking for is a facade iterator that abstracts away the traversing through several sequences.
Since you are coming from a python background I'll assume that you care more about flexibility rather than speed. By flexibility I mean the ability to chain-iterate through different sequence types together (vector, array, linked list, set etc....) and by speed I mean only allocating memory from the stack.
If this is the case then you may want to look at the any_iterator from adobe labs:
http://stlab.adobe.com/classadobe_1_1any__iterator.html
This iterator will give you the ability to iterate through any sequence type at runtime. To chain you would have a vector (or array) of 3-tuple any_iterators, that is, three any_iterators for each range you chain together (you need three to iterate forward or backward, if you just want to iterate forward two will suffice).
Let's say that you wanted to chain-iterate through a sequence of integers:
(Untested psuedo-c++ code)
typedef adobe::any_iterator AnyIntIter;
struct AnyRange {
AnyIntIter begin;
AnyIntIter curr;
AnyIntIter end;
};
You could define a range such as:
int int_array[] = {1, 2, 3, 4};
AnyRange sequence_0 = {int_array, int_array, int_array + ARRAYSIZE(int_array)};
Your RangeIterator class would then have an std::vector.
<code>
class RangeIterator {
public:
RangeIterator() : curr_range_index(0) {}
template <typename Container>
void AddAnyRange(Container& c) {
AnyRange any_range = { c.begin(), c.begin(), c.end() };
ranges.push_back(any_range);
}
// Here's what the operator++() looks like, everything else omitted.
int operator++() {
while (true) {
if (curr_range_index > ranges.size()) {
assert(false, "iterated too far");
return 0;
}
AnyRange* any_range = ranges[curr_range_index];
if (curr_range->curr != curr_range->end()) {
++(curr_range->curr);
return *(curr_range->curr);
}
++curr_range_index;
}
}
private:
std::vector<AnyRange> ranges;
int curr_range_index;
};
</code>
I do want to note however that this solution is very slow. The better, more C++ like approach is just to store all the pointers to the objects that you want operate on and iterate through that. Alternatively, you can apply a functor or a visitor to your ranges.
Not in the standard library. Boost might have something.
But really, such a thing should be trivial to implement. Just make yourself an iterator with a vector of iterators as a member. Some very simple code for operator++, and you're there.
No functionality exists in boost that implements this, to the best of my knowledge - I did a pretty extensive search.
I thought I'd implement this easily last week, but I ran into a snag: the STL that comes with Visual Studio 2008, when range checking is on, doesn't allow comparing iterators from different containers (i.e., you can't compare somevec1.end() with somevec2.end() ). All of a sudden it became much harder to implement this and I haven't quite decided yet on how to do it.
I wrote other iterators in the past using iterator_facade and iterator_adapter from boost, which are better than writing 'raw' iterators but I still find writing custom iterators in C++ rather messy.
If someone can post some pseudocode on how this could be done /without/ comparing iterators from different containers, I'd be much obliged.