Container with key and sorting criteria separate - c++

I want to have a collection of items which are searchable based on a key (an unsigned value), but I want the elements to be sorted based on a different criteria i.e. the last accessed time (Which is part of the value).
How can I achieve this in C++? I can sort them separately on demand, but can I create the container itself such that sorting happens automatically?
Are there ready made containers (in boost) that can have similar feature built into them?

You could probably implement something of this kind, using std::list and std::unordered_map pointing to each other.
#include <list>
#include <unordered_map>
template <typename A>
struct Cache {
using key = unsigned;
struct Composite {
Composite(A &_a, std::list<key>::iterator _it) : a(_a), it(_it) {}
A &a;
std::list<key>::iterator it;
};
std::unordered_map<key, Composite> map;
std::list <key> list;
void insert(key k, A &a) { // Assuming inserting contains accessing
list.emplace_front(k);
map[k] = Composite(a, list.front());
}
A &operator[](key k) {
list.erase(map[k].it);
list.emplace_front(k);
return map[k].a;
}
A &last_accessed() { // or whatever else you wish to implement
assert(!list.empty());
return map[list.front()].a;
}
};
This solution is optimized for keeping track of which element was accessed last. If you want to sort given a different attribute, you can follow a similar process but use an std::set to store the values with your comparison function, and then iterators to that from an std::unordered_map hashed with a key of your choice.

Related

C++: 'unique vector' data structure

I need a data structure like std::vector or std::list whose elements will be unique. In most of time I will call push_back on it, sometimes maybe erase. When I insert an element which is already there, I need to be notified either by some boolean or exception.
And the most important property it should have: the order of insertions. Each time I iterate over it, it should return elements in the order they were inserted.
We can think other way: a queue which guarantees the uniqueness of elements. But I don't want to pop elements, instead I want to iterate over them just like we do for vector or list.
What is the best data structure for my needs?
You can use a std::set
It will return a pair pair<iterator,bool> when the insert method is called. The bool in the pair is false when the element already exists in the set (the element won't be added in that case).
Use a struct with a regular std::vector and a std::set.
When you push, check the set for existence of the element. When you need to iterate, iterate over the vector. If you need to erase from the vector, also erase from the set.
Basically, use the set as an aside, only for fast "presence of an element" check.
// either make your class a template or use a fixed type of element
class unique_vector
{
public:
// implement the various operator you need like operator[]
// alternatively, consider inheriting from std::vector
private:
std::set<T> m_set; // fast lookup for existence of elements
std::vector<T> m_vector; // vector of elements
};
I would prefer using std::unordered_set to stores existing elements in a std::vector and it has faster lookup time of O(1), while the lookup time of std::set is O(logn).
You can use Boost.MultiIndex for this:
Live On Coliru
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/sequenced_index.hpp>
#include <boost/multi_index/hashed_index.hpp>
#include <boost/multi_index/identity.hpp>
using namespace boost::multi_index;
template<typename T>
using unique_list=multi_index_container<
T,
indexed_by<
sequenced<>,
hashed_unique<identity<T>>
>
>;
#include <iostream>
int main()
{
unique_list<int> l;
auto print=[&](){
const char* comma="";
for(const auto& x:l){
std::cout<<comma<<x;
comma=",";
}
std::cout<<"\n";
};
l.push_back(0);
l.push_back(1);
l.push_back(2);
l.push_back(0);
l.push_back(2);
l.push_back(4);
print();
}
Output
0,1,2,4

How can I implement a custom C++ iterator that efficiently iterates over key-value pairs that are backed by two distinct arrays

I want to make use of libstdc++'s __gnu_parallel::multiway_merge to merge four large sequences of sorted key-value pairs at once (to save memory bandwidth).
Each sequence of key-value pairs is represented by two distinct arrays, such that
values[i] is the value associated with keys[i].
The implementation for multiway-merging a single array of keys (keys-only) or an array of std::pairs would be straightforward. However, I need to implement a custom iterator that I can pass to the multiway_merge, which holds one reference to my keys array and one to the corresponding values array.
So my approach looks something like the following:
template<
typename KeyT,
typename ValueT
>
class ForwardIterator : public std::iterator<std::forward_iterator_tag, KeyT>
{
KeyT* k_itr;
ValueT* v_itr;
size_t offset;
explicit ForwardIterator(KeyT* k_start, ValueT *v_start) : k_itr(k_start), v_itr(v_start), offset(0)
{
}
ForwardIterator& operator++ () // Pre-increment
{
offset++;
return *this;
}
}
However, the problems start as soon as I'm getting to the overloading of the dereferencing operator.
Help is really appreciated! Thanks!

STL container that combines sequential ordering with constant access time?

This strikes me as something I should have been able to find on Stackoverflow, but maybe I'm searching for the wrong terms here.
I have the scenario where there is a class
class Foo
{
int key;
int b;
...
}
and I want to push new elements of that class onto a list. Number of items is unknown beforehand.
At the same time, I want to be able to quickly check the existence of (and retrieve) an element with a certain key, e.g. key==5.
So, to summarize:
It should have O(1) (or thereabouts) for existence/retrieval/deletion
It should retain the order of the pushed items
One solution to this strikes me a "use a std::list to store the items, and std::unordered_map to retrieve them, with some book-keeping". I could of course implement it myself, but was wondering whether something like this already exists in a convenient form in STL.
EDIT: To preempt the suggestion, std::map is not a solution, because it orders based on some key. That is, it won't retain the ordering in which I pushed the items.
STL does not have the kind of container, you may write your own using std::unordered_map and std::list. Map your keys to the list iterators in std::unordered_map and store key-value pairs in std::list.
void add(const Key& key, const Value& value)
{
auto iterator = list.insert(list.end(), std::pair<Key, Value>(key, value));
map[key] = iterator;
}
void remove(const Key& key)
{
auto iterator = map[key];
map.erase(key);
list.erase(iterator);
}
Or use boost multi-index container
http://www.boost.org/doc/libs/1_57_0/libs/multi_index/doc/tutorial/index.html

Replacing std::map with std::set and search by index

Say we have a map with larger objects and an index value. The index value is also part of the larger object.
What I would like to know is whether it is possible to replace the map with a set, extracting the index value.
It is fairly easy to create a set that sorts on a functor comparing two larger objects by extracting the index value.
Which leaves searching by index value, which is not supported by default in a set, I think.
I was thinking of using std::find_if, but I believe that searches linearly, ignoring the fact we have set.
Then I thought of using std::binary_search with a functor comparing the larger object and the value, but I believe that it doesn't work in this case as it wouldn't make use of the structure and would use traversal as it doesn't have a random access iterator. Is this correct? Or are there overloads which correctly handle this call on a set?
And then finally I was thinking of using a boost::containter::flat_set, as this has an underlying vector and thus presumably should be able to work well with std::binary_search?
But maybe there is an all together easier way to do this?
Before you answer just use a map where a map ought to be used - I am actually using a vector that is manually sorted (well std::lower_bound) and was thinking of replacing it with boost::containter::flat_set, but it doesn't seem to be easily possible to do so, so I might just stick with the vector.
C++14 will introduce the ability to lookup by a key that does not require the construction of the entire stored object. This can be used as follows:
#include <set>
#include <iostream>
struct StringRef {
StringRef(const std::string& s):x(&s[0]) { }
StringRef(const char *s):x(s) { std::cout << "works: " << s << std::endl; }
const char *x;
};
struct Object {
long long data;
std::size_t index;
};
struct ObjectIndexer {
ObjectIndexer(Object const& o) : index(o.index) {}
ObjectIndexer(std::size_t index) : index(index) {}
std::size_t index;
};
struct ObjComp {
bool operator()(ObjectIndexer a, ObjectIndexer b) const {
return a.index < b.index;
}
typedef void is_transparent; //Allows the comparison with non-Object types.
};
int main() {
std::set<Object, ObjComp> stuff;
stuff.insert(Object{135, 1});
std::cout << stuff.find(ObjectIndexer(1))->data << "\n";
}
More generally, these sorts of problems where there are multiple ways of indexing your data can be solved using Boost.MultiIndex.
Use boost::intrusive::set which can utilize the object's index value directly. It has a find(const KeyType & key, KeyValueCompare comp) function with logarithmic complexity. There are also other set types based on splay trees, AVL trees, scapegoat trees etc. which may perform better depending on your requirements.
If you add the following to your contained object type:
less than operator that only compares the object indices
equality operator that only compares the object indices
a constructor that takes your index type and initializes a dummy object with that value for the index
then you can pass your index type to find, lower_bound, equal_range, etc... and it will act the way you want. When you pass your index to the set's (or flat_set's) find methods it will construct a dummy object of the contained type to use for the comparisons.
Now if your object is really big, or expensive to construct, this might not be the way you want to go.

Why does `std::unordered_map` "speak like the Yoda" - re-arrange elements?

When trying to write the std::string keys of an std::unordered_map in the following example, the keys get written in a different order than the one given by the initializer list:
#include <iostream>
#include <unordered_map>
class Data
{
typedef std::unordered_map<std::string, double> MapType;
typedef MapType::const_iterator const_iterator;
MapType map_;
public:
Data(const std::initializer_list<std::string>& i)
{
int counter = 0;
for (const auto& name : i)
{
map_[name] = counter;
}
}
const_iterator begin() const
{
return map_.begin();
}
const_iterator end() const
{
return map_.end();
}
};
std::ostream& operator<<(std::ostream& os, const Data& d)
{
for (const auto& pair : d)
{
os << pair.first << " ";
}
return os;
}
using namespace std;
int main(int argc, const char *argv[])
{
Data d = {"Why", "am", "I", "sorted"};
// The unordered_map speaks like Yoda.
cout << d << endl;
return 0;
}
I expected to see 'Why am I sorted', but I got a Yoda-like output:
sorted I am Why
Reading on the unordered_map here, I saw this:
Internally, the elements are not sorted in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the hash of its key. This allows fast access to individual elements, since once hash is computed, it refers to the exact bucket the element is placed into.
Is this why the elements are not ordered in the same way as in the initializer list?
What data structure do I then use when I want the keys to be ordered in the same way as the initializer list? Should I internally keep a vector of strings to somehow save the argument order? Can the bucket organization be turned off somehow by choosing a specific hashing function?
What data structure do I then use when I want the keys to be ordered in the same way as the initializer list? Should I internally keep a vector of strings to somehow save the argument order?
Maybe all you want is actually a list/vector of (key, value) pairs?
If you want both O(1) lookup (hashmap) and iteration in the same order as insertion - then yes, using a vector together with an unordered_map sounds like a good idea. For example, Django's SortedDict (Python) does exactly that, here's the source for inspiration:
https://github.com/django/django/blob/master/django/utils/datastructures.py#L122
Python 2.7's OrderedDict is a bit more fancy (map values point to doubly-linked list links), see:
http://code.activestate.com/recipes/576693-ordered-dictionary-for-py24/
I'm not aware of an existing C++ implementation in standard libs, but this might get you somewhere. See also:
a C++ hash map that preserves the order of insertion
A std::map that keep track of the order of insertion?
unordered_map is, by definition, unordered, so you shall not expect any ordering when accessing the map sequentially.
If you don't want elements sorted by the key value, just use a container that keeps your order of insertion, be it a vector, deque, list or whatever, of pair<key, value> element type if you insist on using it.
Then, if an alement B is added after element A, it will always appear later. This holds true for initializer_list initialization as well.
You could probably use something like Boost.MultiIndex to keep it both sorted by insertion order and arbitrary key.