Constant time test for first element in intrusive rbtree

Constant time test for first element in intrusive rbtree - c++

How can I efficiently tell if an element is at the beginning of an intrusive set or rbtree? I would like to define a simple function prev that returns a pointer to the previous item in a tree, or nullptr if there is no previous item. An analogous next function is easy to write, using iterator_to and comparing to end(). However, there is no equivalent reverse_iterator_to function that would allow me to compare to rend(). Moreover, I specifically do not want to compare to begin(), because that's not constant time in a red-black tree.
One thing that certainly seems to work is decrementing an iterator and comparing it to end(). That works fine with the implementation, but I can find no support for this in the documentation. What's the best way to implement prev in the following minimal working example?
#include <iostream>
#include <string>
#include <boost/intrusive/set.hpp>
using namespace std;
using namespace boost::intrusive;
struct foo : set_base_hook<> {
string name;
foo(const char *n) : name(n) {}
friend bool operator<(const foo &a, const foo &b) { return a.name < b.name; }
};
rbtree<foo> tree;
foo *
prev(foo *fp)
{
auto fi = tree.iterator_to(*fp);
return --fi == tree.end() ? nullptr : &*fi;
}
int
main()
{
tree.insert_equal(*new foo{"a"});
tree.insert_equal(*new foo{"b"});
tree.insert_equal(*new foo{"c"});
for (foo *fp = &*tree.find("c"); fp; fp = prev(fp))
cout << fp->name << endl;
}
Update: Okay, so what I was missing, which is probably what sehe was getting at indirectly, is that in STL begin() is actually guaranteed to be constant-time. So even though a generic red-black tree requires log(n) time to find the minimum element, an STL map doesn't--an STL std::map implementation is required to cache the first element. And I think the point sehe is making is that even though boost is not documented, it is fair to assume that boost::intrusive containers behave sort of like STL containers. Given that assumption, it is perfectly fine to say:
foo *
prev(foo *fp)
{
auto fi = tree.iterator_to(*fp);
return fi == tree.begin() ? nullptr : &*--fi;
}
As the comparison to tree.begin() shouldn't be too costly.

You can get the reverse-iterator from iterator_to.
Also, note that there is rbtree<>::container_from_iterator(iterator it) so you don't have to have a "global" state for your prev function.
You can just create the corresponding reverse_iterator. You'll have to +1 the iterator to get the expected address:
So my take on this would be (bonus: without memory leaks):
Live On Coliru
#include <boost/intrusive/set.hpp>
#include <iostream>
#include <string>
#include <vector>
using namespace boost::intrusive;
struct foo : set_base_hook<> {
std::string name;
foo(char const* n) : name(n) {}
bool operator<(const foo &b) const { return name < b.name; }
};
int main()
{
std::vector<foo> v;
v.emplace_back("a");
v.emplace_back("b");
v.emplace_back("c");
using Tree = rbtree<foo>;
Tree tree;
tree.insert_unique(v.begin(), v.end());
for (auto key : { "a", "b", "c", "missing" })
{
std::cout << "\nusing key '" << key << "': ";
auto start = tree.iterator_to(*tree.find(key));
if (start != tree.end()) {
for (auto it = Tree::reverse_iterator(++start); it != tree.rend(); ++it)
std::cout << it->name << " ";
}
}
}
Which prints
using key 'a': a
using key 'b': b a
using key 'c': c b a
using key 'missing':

Related

need STL set in insertion order

How to store elements in set in insertion order.
for example.
set<string>myset;
myset.insert("stack");
myset.insert("overflow");
If you print, the output is
overflow
stack
needed output :
stack
overflow

One way is to use two containers, a std::deque to store the elements in insertion order, and another std::set to make sure there are no duplicates.
When inserting an element, check if it's in the set first, if yes, throw it out; if it's not there, insert it both in the deque and the set.
One common scenario is to insert all elements first, then process(no more inserting), if this is the case, the set can be freed after the insertion process.

A set is the wrong container for keeping insertion order, it will sort its element according to the sorting criterion and forget the insertion order. You have to use a sequenced container like vector, deque or list for that. If you additionally need the associative access set provides you would have to store your elements in multiple containers simultaneously or use a non-STL container like boost::multi_index which can maintain multiple element orders at the same time.
PS: If you sort the elements before inserting them in a set, the set will keep them in insertion order but I think that will not address your problem.
If you don't need any order besides the insertion order, you could also store the insert number in the stored element and make that the sorting criterion. However, why one would use a set in this case at all escapes me. ;)

Here's how I do it:
template <class T>
class VectorSet
{
public:
using iterator = typename vector<T>::iterator;
using const_iterator = typename vector<T>::const_iterator;
iterator begin() { return theVector.begin(); }
iterator end() { return theVector.end(); }
const_iterator begin() const { return theVector.begin(); }
const_iterator end() const { return theVector.end(); }
const T& front() const { return theVector.front(); }
const T& back() const { return theVector.back(); }
void insert(const T& item) { if (theSet.insert(item).second) theVector.push_back(item); }
size_t count(const T& item) const { return theSet.count(item); }
bool empty() const { return theSet.empty(); }
size_t size() const { return theSet.size(); }
private:
vector<T> theVector;
set<T> theSet;
};
Of course, new forwarding functions can be added as needed, and can be forwarded to whichever of the two data structures implements them most efficiently. If you are going to make heavy use of STL algorithms on this (I haven't needed to so far) you may also want to define member types that the STL expects to find, like value_type and so forth.

If you can use Boost, a very straightforward solution is to use the header-only library Boost.Bimap (bidirectional maps).
Consider the following sample program that will display your dummy entries in insertion order (try out here):
#include <iostream>
#include <string>
#include <type_traits>
#include <boost/bimap.hpp>
using namespace std::string_literals;
template <typename T>
void insertCallOrdered(boost::bimap<T, size_t>& mymap, const T& element) {
// We use size() as index, therefore indexing with 0, 1, ...
// as we add elements to the bimap.
mymap.insert({ element, mymap.size() });
}
int main() {
boost::bimap<std::string, size_t> mymap;
insertCallOrdered(mymap, "stack"s);
insertCallOrdered(mymap, "overflow"s);
// Iterate over right map view (integers) in sorted order
for (const auto& rit : mymap.right) {
std::cout << rit.first << " -> " << rit.second << std::endl;
}
}

I'm just wondering why nobody has suggested using such a nice library as Boost MultiIndex. Here's an example how to do that:
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/indexed_by.hpp>
#include <boost/multi_index/identity.hpp>
#include <boost/multi_index/sequenced_index.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <iostream>
template<typename T>
using my_set = boost::multi_index_container<
T,
boost::multi_index::indexed_by<
boost::multi_index::sequenced<>,
boost::multi_index::ordered_unique<boost::multi_index::identity<T>>
>
>;
int main() {
my_set<int> set;
set.push_back(10);
set.push_back(20);
set.push_back(3);
set.push_back(11);
set.push_back(1);
// Prints elements of the set in order of insertion.
const auto &index = set.get<0>();
for (const auto &item : index) {
std::cout << item << " ";
}
// Prints elements of the set in order of value.
std::cout << "\n";
const auto &ordered_index = set.get<1>();
for (const auto &item : ordered_index) {
std::cout << item << " ";
}
}

what you need is this, very simple and a standard library. Example online compiler link: http://cpp.sh/7hsxo
#include <iostream>
#include <string>
#include <unordered_set>
static std::unordered_set<std::string> myset;
int main()
{
myset.insert("blah");
myset.insert("blah2");
myset.insert("blah3");
int count = 0;
for ( auto local_it = myset.begin(); local_it!= myset.end(); ++local_it ) {
printf("index: [%d]: %s\n", count, (*local_it).c_str());
count++;
}
printf("\n");
for ( unsigned i = 0; i < myset.bucket_count(); ++i) {
for ( auto local_it = myset.begin(i); local_it!= myset.end(i); ++local_it )
printf("bucket: [%d]: %s\n", i, (*local_it).c_str());
}
}

C++ storing a value in an unordered pair

I want to store a floating point value for an unordered pair of an integers. I am unable to find any kind of easy to understand tutorials for this. E.g for the unordered pair {i,j} I want to store a floating point value f. How do I insert, store and retrieve values like this?

Simple way to handle unordered int pairs is using std::minmax(i,j) to generate std::pair<int,int>. This way you can implement your storage like this:
std::map<std::pair<int,int>,float> storage;
storage[std::minmax(i,j)] = 0.f;
storage[std::minmax(j,i)] = 1.f; //rewrites storage[(i,j)]
Admittedly proper hashing would give you some extra performance, but there is little harm in postponing this kind of optimization.

Here's some indicative code:
#include <iostream>
#include <unordered_map>
#include <utility>
struct Hasher
{
int operator()(const std::pair<int, int>& p) const
{
return p.first ^ (p.second << 7) ^ (p.second >> 3);
}
};
int main()
{
std::unordered_map<std::pair<int,int>, float, Hasher> m =
{ { {1,3}, 2.3 },
{ {2,3}, 4.234 },
{ {3,5}, -2 },
};
// do a lookup
std::cout << m[std::make_pair(2,3)] << '\n';
// add more data
m[std::make_pair(65,73)] = 1.23;
// output everything (unordered)
for (auto& x : m)
std::cout << x.first.first << ',' << x.first.second
<< ' ' << x.second << '\n';
}
Note that it relies on the convention that you store the unordered pairs with the lower number first (if they're not equal). You might find it convenient to write a support function that takes a pair and returns it in that order, so you can use that function when inserting new values in the map and when using a pair as a key for trying to find a value in the map.
Output:
4.234
3,5 -2
1,3 2.3
65,73 1.23
2,3 4.234
See it on ideone.com. If you want to make a better hash function, just hunt down an implementation of hash_combine (or use boost's) - plenty of questions here on SO explaining how to do that for std::pair<>s.

You implement a type UPair with your requirements and overload ::std::hash (which is the rare occasion that you are allowed to implement something in std).
#include <utility>
#include <unordered_map>
template <typename T>
class UPair {
private:
::std::pair<T,T> p;
public:
UPair(T a, T b) : p(::std::min(a,b),::std::max(a,b)) {
}
UPair(::std::pair<T,T> pair) : p(::std::min(pair.first,pair.second),::std::max(pair.first,pair.second)) {
}
friend bool operator==(UPair const& a, UPair const& b) {
return a.p == b.p;
}
operator ::std::pair<T,T>() const {
return p;
}
};
namespace std {
template <typename T>
struct hash<UPair<T>> {
::std::size_t operator()(UPair<T> const& up) const {
return ::std::hash<::std::size_t>()(
::std::hash<T>()(::std::pair<T,T>(up).first)
) ^
::std::hash<T>()(::std::pair<T,T>(up).second);
// the double hash is there to avoid the likely scenario of having the same value in .first and .second, resulinting in always 0
// that would be a problem for the unordered_map's performance
}
};
}
int main() {
::std::unordered_map<UPair<int>,float> um;
um[UPair<int>(3,7)] = 3.14;
um[UPair<int>(8,7)] = 2.71;
return 10*um[::std::make_pair(7,3)]; // correctly returns 31
}

boost::any_range and operator []

Consider the following code:
#include <boost/range.hpp>
#include <boost/range/any_range.hpp>
#include <boost/range/join.hpp>
#include <iostream>
#include <algorithm>
#include <string>
#include <vector>
#include <list>
struct TestData {
TestData() : m_strMem01("test"), m_intMem02(42), m_boolMem03(true) {}
std::string m_strMem01;
int m_intMem02;
bool m_boolMem03;
};
struct IntComp {
bool operator()(const TestData &s, int i) { return s.m_intMem02 < i; }
bool operator()(int i, const TestData &s) { return i < s.m_intMem02; }
bool operator()(const TestData &i, const TestData &s) {
return i.m_intMem02 < s.m_intMem02;
}
};
struct StrComp {
bool operator()(const TestData &s, const std::string &str) {
return s.m_strMem01 < str;
}
bool operator()(const std::string &str, const TestData &s) {
return str < s.m_strMem01;
}
bool operator()(const TestData &i, const TestData &s) {
return i.m_strMem01 < s.m_strMem01;
}
};
typedef boost::any_range<TestData, boost::forward_traversal_tag,
const TestData &, std::ptrdiff_t> TestRange;
std::vector<TestData> vecData(10);
std::list<TestData> listData(20);
TestRange foo() {
TestRange retVal;
auto tmp1 = std::equal_range(vecData.cbegin(), vecData.cend(), 42, IntComp());
retVal = boost::join(retVal, tmp1);
auto tmp2 =
std::equal_range(listData.cbegin(), listData.cend(), "test", StrComp());
retVal = boost::join(retVal, tmp2);
return retVal;
}
int main(int argc, char *argv[]) {
auto res = foo();
for (auto a : res) {
std::cout << a.m_strMem01 << std::endl;
}
//std::cout << res[4].m_intMem02 << std::endl;
}
If you uncomment the last line the code fails since distance_to not implemented for any_forward_iterator_interface. I'm not sure what exactly I'm missing here, like implementing operator[] or distance_to but for what? My own version traversal tag? And why it doesn't work in the first place?
Coliru version

I would say the answer depends on your performance needs and your laziness when it comes to implementing a new iterator abstraction. The core reason for your [] operator not working is the fact that std::list<...> does not provide a random access traversal iterator. If you would have chosen a container that provides such an iterator. You any_range<...> could have taken the random_access_traversal_tag and everything would be fine.
I think it's fair to say that it is not such a big deal to implement a random access iterator on top of a list by simply encapsulating the current index and count forward and backward within the list whenever a specific position is meant to be accessed, but it's clearly against the nature of the list performance-wise.
Is there a good reason to hold one of the collection in a list ?
Is there a good reason to access the resulting any_range by random ?
Is it worth the effort to provide a inefficient random access interface for std::list ?

Of course any_iterator (which underlies the any_range implementation) doesn't gratuitously emulate RandomAccess iterators for any odd iterator you pass.
If you want that, just make an iterator adaptor that does this (making it very slow to random access elements in a list - so don't do this).

How to orderly traverse a Boost.Heap Priority Queue and update a given element?

I'm looking for a good data structure that can maintain its elements sorted. Currently I'm trying Boost.Heap.
I frequently need to orderly traverse the data structure and when reaching an element based on some property, update its priority. Boost.Heap priority queues provide ordered and non-ordered iterators. Element updates occurs through a node handle, a handle can be obtained from a ordinary non-ordered iterator, but not directly from a ordered one as in the following example:
#include <iostream>
#include <algorithm>
#include <boost/heap/fibonacci_heap.hpp>
using namespace boost::heap;
int main()
{
fibonacci_heap<int> fib_heap;
fib_heap.push(1);
fib_heap.push(2);
fib_heap.push(3);
for(auto i = fib_heap.ordered_begin(); i != fib_heap.ordered_end(); ++i)
{
// no viable conversion here
auto h = fibonacci_heap<int>::s_handle_from_iterator(i);
if(*h == 2) // dumb test
{
fib_heap.increase(h, *h + 2);
break;
}
}
std::for_each(fib_heap.ordered_begin(), fib_heap.ordered_end(),
[](const int &e)
{
std::cout << e << std::endl;
});
}
How can I orderly traverse the queue and update an element in the traversal?
Note that I leave traversal after the update.
(Suggestions of alternative libraries for such purpose are welcome)

If I find no better alternative, I'll need to save the handle inside each corresponding element for later usage (c++1y code):
#include <iostream>
#include <algorithm>
#include <boost/heap/fibonacci_heap.hpp>
using namespace boost::heap;
template<typename T>
struct heap_data
{
typedef typename fibonacci_heap<heap_data>::handle_type handle_t;
handle_t handle;
T data;
heap_data(const T &data_) : data(data_) {}
bool operator<(heap_data const & rhs) const
{
return data < rhs.data;
}
};
void setup_handle(fibonacci_heap<heap_data<int>>::handle_type &&handle)
{
(*handle).handle = handle;
}
int main()
{
fibonacci_heap<heap_data<int>> heap;
setup_handle(heap.emplace(1));
setup_handle(heap.emplace(2));
setup_handle(heap.emplace(3));
std::find_if(heap.ordered_begin(), heap.ordered_end(),
[&heap](const heap_data<int> &e)
{
if(e.data == 2)
{
const_cast<heap_data<int> &>(e).data += 2;
heap.increase(e.handle);
return true;
}
return false;
});
std::for_each(heap.ordered_begin(), heap.ordered_end(),
[](const heap_data<int> &e)
{
std::cout << e.data << std::endl;
});
}

Your requirements are not very clear to me. But how about std::multimap or std::multiset? Update operations are O(log n). I think traversal should be O(n) (BST traversal), but it's not documented in my standard C++ references (cppreference.com, cplusplus.com). Looks like boost::heap traversal is amortized O(n log n).

c++ std::vector search for value

I am attempting to optimize a std::vector "search " - index based iterating through a vector and returning and element that matches a "search" criteria
struct myObj {
int id;
char* value;
};
std::vector<myObj> myObjList;
create a few thousand entries with unique id's and values and push them to the vector myObjList.
What is the most efficient way to retrieve myObj that matches the id.
Currently I am index iterating like:
for(int i = 0; i < myObjList.size(); i++){
if(myObjList.at(i).id == searchCriteria){
return myObjList.at(i);
}
}
Note: searchCriteria = int. All the elements have unique id's.
The above does the job, but probably not the most efficient way.

The C++ standard library has some abstract algorithms, which give C++ a kind of functional flavour, as I call it, which lets you concentrate more on the criteria of your search than on how you implement the search itself. This applies to a lot of other algorithms.
The algorithm you are looking for is std::find_if, a simple linear search through an iterator range.
In C++11, you can use a lambda to express your criteria:
std::find_if(myObjList.begin(), myObjList.end(), [&](const myObj & o) {
return o.id == searchCriteria;
});
When not having C++11 available, you have to provide a predicate (function object (=functor) or function pointer) which returns true if the provided instance is the one you are looking for. Functors have the advantage that they can be parameterized, in your case you want to parameterize the functor with the ID you are looking for.
template<class TargetClass>
class HasId {
int _id;
public:
HasId(int id) : _id(id) {}
bool operator()(const TargetClass & o) const {
return o.id == _id;
}
}
std::find_if(myObjList.begin(), myObjList.end(), HasId<myObj>(searchCriteria));
This method returns an iterator pointing to the first element found which matches your criteria. If there is no such element, the end iterator is returned (which points past the end of the vector, not to the last element). So your function could look like this:
vector<myObj>::iterator it = std::find_if(...);
if(it == myObjList.end())
// handle error in any way
else
return *it;

Using std::find_if.
There's an example on the referenced page.
Here's a working example that more precisely fits your question:
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
struct myObj
{
int id;
char* value;
myObj(int id_) : id(id_), value(0) {}
};
struct obj_finder
{
obj_finder(int key) : key_(key)
{}
bool operator()(const myObj& o) const
{
return key_ == o.id;
}
const int key_;
};
int main () {
vector<myObj> myvector;
vector<myObj>::iterator it;
myvector.push_back(myObj(30));
myvector.push_back(myObj(50));
myvector.push_back(myObj(100));
myvector.push_back(myObj(32));
it = find_if (myvector.begin(), myvector.end(), obj_finder(100));
cout << "I found " << it->id << endl;
return 0;
}
And, if you have C++11 available, you can make this even more concise using a lambda:
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
struct myObj
{
int id;
char* value;
myObj(int id_) : id(id_), value(0) {}
};
int main ()
{
vector<myObj> myvector;
vector<myObj>::iterator it;
myvector.push_back(myObj(30));
myvector.push_back(myObj(50));
myvector.push_back(myObj(100));
myvector.push_back(myObj(32));
int key = 100;
it = find_if (myvector.begin(), myvector.end(), [key] (const myObj& o) -> bool {return o.id == key;});
cout << "I found " << it->id << endl;
return 0;
}

This isn't really an answer to your question. The other people who answered gave pretty good answers, so I have nothing to add to them.
I would like to say though that your code is not very idiomatic C++. Really idiomatic C++ would, of course, use ::std::find_if. But even if you didn't have ::std::find_if your code is still not idiomatic. I'll provide two re-writes. One a C++11 re-write, and the second a C++03 re-write.
First, C++11:
for (auto &i: myObjList){
if(i.id == searchCriteria){
return i;
}
}
Second, C++03:
for (::std::vector<myObj>::iterator i = myObjList.begin(); i != myObjList.end(); ++i){
if(i->id == searchCriteria){
return *i;
}
}
The standard way of going through any sort of C++ container is to use an iterator. It's nice that vectors can be indexed by integer. But if you rely on that behavior unnecessarily you make it harder on yourself if you should change data structures later.

If the ids are sorted you may perform binary search(there is also a function binary_search in stl). If they are not nothing will perform better, but still you may write your code in a shorter way using stl(use find_if).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Constant time test for first element in intrusive rbtree - c++

Related

need STL set in insertion order

C++ storing a value in an unordered pair

boost::any_range and operator []

How to orderly traverse a Boost.Heap Priority Queue and update a given element?

c++ std::vector search for value

Categories

Resources