C++ map set hybrid

C++ map set hybrid - c++

I have folder information in
struct folder
{
int id;
int folder_count;
long long size;
};
I need to keep folders (can be 1000 or more ) sorted by their folder_count and size respectively (folder having most folder_count must be first, and if there are same folder_count, it needs to be sorted by size).
I have achieved it by custom comparator
struct folder_comparator
{
bool operator() (const folder& a, const folder& b) const
{
return a.folder_count>b.folder_count || (a.folder_count==b.folder_count && a.size>=b.size);
}
};
and putting folders into set
set <folder, folder_comparator> folders;
But in the meantime folder gets many updates. I can achieve this by map with key being id of folder.
map<int, folder> folders;
But in this case i cannot keep custom order (mentioned above).
I need both. (keep custom order and O(1) or atleast O(log(N)) complexity search)
What data structure or hybrid of set and map can help me in this situation?

I need both. (keep custom order and O(1) or atleast O(log(N)) complexity search)
This means Boost Multi-Index most probably will fit your need perfectly with hashed index on id and non unique sorted index with your (fixed) comparator. It has learning curve though but it seems it would worse the effort in your case (rather than maintain 2 independed containers)
PS you current comparator does not meet requirement for strict weak ordering which is part of all sorted standard and boost containers. You need to fix it in either case. Easiest way to provide proper comparator is to use std::tie:
bool operator() (const folder& a, const folder& b) const
{
return std::tie( a.folder_count, a.size ) > std::tie( b.folder_count, b.size );
}

Related

Sorting structures inside vector by two criteria in alphabetical order

I have a following data structure (first string as "theme" of the school)
map<string, vector<School>> information;
And the school is:
struct School {
string name;
string location;
}
I have trouble printing my whole data structure out in alphabetical order (first theme, then location, then name). For an example.
"map key string : struct location : struct name"
"technology : berlin : university_of_berlin"
So far I managed to loop through the initial map by
for (auto const key:information) {
//access to struct
vector<School> v = key.second;
//sorting by location name
//comparasion done by seperate function that returns school.location1 < school.location2
sort(v.begin(), v.end(), compare);
If I print out the theme (key.first) and v.location, it's almost complete. Map is ordered by default and location comparasion works. But I can't figure out how to add second comparasion by name. If I do another sorting, this time by name, then I lose the original order by location. Is it somehow possible to "double sorting" where one criteria is more important, then another?

You can, you just need to add some condition in compare
bool compare(School const& lhs, School const& rhs)
{
if(lhs.location != rhs.location)
return lhs.location < rhs.location)
return lhs.name < rhs.name
}
Or you can overload the < operator like #ceorron did

There is a simple answer to this, I assume that you want to order first by "location" then "name".
The simple way is to implement a less operator in the struct School structure.
Example code:
//in School.hpp
struct School {
string name;
string location;
bool operator<(const School& rhs) const;
}
//in School.cpp
bool School::operator<(const School& rhs) const {
if(this->location < rhs.location)
return true;
if(rhs.location < this->location)
return false;
if(this->name < rhs.name)
return true;
if(rhs.name < this->name)
return false;
return false;
}
There are other ways though, you now call sort like this.
sort(v.begin(), v.end());

I am adding this answer just to be pedantic. See JustANewbie’s response for what is correct for this particular case (and I would say in most normal cases).
It is totally possible to perform multiple-pass sorts. The trick is to use a stable sorting method for each additional pass. (A stable sort preserves relative ordering of equivalent elements.)
The default std::sort algorithm is Introsort — which is not a stable sort (it uses Quicksort + Insertion Sort but switches to Heapsort if the Quicksort would take longer).
Conveniently, the Standard Library provides us the std::stable_sort algorithm for when we need a stable sort.
A stable sort is typically slower than a non-stable sort, which is why we tend to prefer the non-stable sort whenever possible. The first pass you can use a non-stable sort, but you must use a stable sort for all additional passes.
std::sort ( xs.begin(), xs.end(), compare_names ); // 1st pass: secondary criterion
std::stable_sort( xs.begin(), xs.end(), compare_locations ); // 2nd pass: primary criterion
The final order will be sorted primarily by location and secondarily by name.
You can add as many sorting passes as you need. Just remember that you apply the passes in reverse order of their significance. For example, if you want to sort people by (last name, first name, age), you must apply the sorting in reverse order: age, first name, last name.

C++ fixed-capacity associate container

I am looking for a container like std::unordered_map that does not use any dynamic allocation. I believe this would be the case for any associate container with a fixed number of keys, or even keys that have to be chosen at compile time.
I am not looking for a constexpr or compile time hash map because I would like to be able to update the values in the map.
Example use case:
FixedCapacityMap<std::string_view, int> fruits {
{"n_apples", 0},
{"n_pairs", 0}
}
fruits["n_apples"] += 1;
fruits["n_pairs"] += 1;
Does anyone know if such a library exists, and if not how something like this could be implemented?

A necessary consequence of the "no dynamic allocation" rule is that the underlying data is embedded in your type, so you need to specify the number of keys as a template parameter as well.
If the keys are known at compile time you can construct a fixed-size hash table over that.
In general, the next best thing is either chained hashing or binary search. Here is a small implementation that uses binary search over a std::array<std::pair<K,V>, N>:
template <class K, class V, size_t N>
class FixedCapacityMap {
public:
using value_type = std::pair<K,V>;
FixedCapacityMap(std::initializer_list<value_type> init) {
assert(init.size() == N);
std::copy(cbegin(init), cend(init), begin(store));
}
V& operator[](const K& key) {
auto it = std::lower_bound(begin(store), end(store), std::pair{key, V()});
if (it == end(store) || it.first != key)
throw std::out_of_range(key);
return it.second;
}
private:
std::array<value_type, N> store;
}

I was able to find a library with this functionality:
https://github.com/serge-sans-paille/frozen
It allows for constexpr and constinit ordered and unordered maps with (newly added by me) the ability to update values in runtime.

Ordering a container on something else than the key

I am currently trying to implement a A* algorithm and I've come to a problem :
I want to keep a set of distinct objects, identified by a hash (I've used boost::hash and family, but can use anything else) and ordered by a public int value, member of those objects.
The goal is being able to retrieve the smaller object based on the int value in O(1) and guarantee uniqueness in the most efficient manner (hash seemed a good way to achieve that, but i'm open to alternatives). I don't need to iterate over the container if those two conditions are met.
Is there any already present implementation that answer those specifications ? Am I mistaken in my assumptions ? Should I just extend any existing container ?
EDIT :
Apparently unclear on what "smaller based on int value" means. I mean that my object has a public attribute (lets say score). For two objects a and b, a < b if and only if a.score < b.score.
I want a and b to be in a container, ordered by score. And if I try to insert c with c.hash == a.hash, I want the insertion to fail.

Although std::priority_queue is an adapter, its Container template parameter has to satisfy SequenceContainer, so you can't build one backed by a std::set.
It looks like your best option is to maintain both a set and a priority queue, and use the former to control insertion into the latter. It may be a good idea to encapsulate that into a container-concept class, but you might get away with a couple of methods if your use of it is quite localised.

use a custom comparator and a std::set :
#include <set>
#include <string>
struct Object
{
int value;
long hash;
std::string data;
Object(int value, std::string data) :
value(value), data(data)
{
}
bool operator<(const Object& other) const
{
return data < other.data;
}
};
struct ObjComp1
{
bool operator()(const Object& lhs, const Object& rhs) const
{
return lhs.value < rhs.value;
}
};
struct ObjComp2
{
bool operator()(const Object& lhs, const Object& rhs) const
{
if (lhs.value != rhs.value)
{
return lhs.value < rhs.value;
}
return lhs < rhs;
}
};
int main()
{
Object o1(5, "a");
Object o2(1, "b");
Object o3(1, "c");
Object o4(1, "c");
std::set<Object, ObjComp1> set;
set.insert(o1);
set.insert(o2);
set.insert(o3);
set.insert(o4);
std::set<Object, ObjComp2> set2;
set2.insert(o1);
set2.insert(o2);
set2.insert(o3);
set2.insert(o4);
return 0;
}
First variant will allow you to only insert o1 and o2, second variant will allow you to insert o1, o2 and o3, as it's not really clear which one you need. The only downside is that you need to code your own operator< for the Object type.
alternatively if you don't want to create a custom operator< for you data type, you can wrap a std::map > but this is less straightforward

You could use the stl type priority_queue. If your elements are integers then you could do:
priority_queue<int> q;
Priority queues are internally implemented with heaps, a complete binary tree whose root always is the minimum element of the set. So, you could consult in O(1) by invoking top().
However, as you algorithm progress, you will need extract the items with pop(). Since is a binary tree, the extraction takes O(log N), which it is not O(1), but is a very good time and it is guaranteed, by contrast with a expected time, which would be the case for an imperfect hash table .
I do not know a way for maintaining a set and extracting the minimum in O(1).

Replacing std::map with std::set and search by index

Say we have a map with larger objects and an index value. The index value is also part of the larger object.
What I would like to know is whether it is possible to replace the map with a set, extracting the index value.
It is fairly easy to create a set that sorts on a functor comparing two larger objects by extracting the index value.
Which leaves searching by index value, which is not supported by default in a set, I think.
I was thinking of using std::find_if, but I believe that searches linearly, ignoring the fact we have set.
Then I thought of using std::binary_search with a functor comparing the larger object and the value, but I believe that it doesn't work in this case as it wouldn't make use of the structure and would use traversal as it doesn't have a random access iterator. Is this correct? Or are there overloads which correctly handle this call on a set?
And then finally I was thinking of using a boost::containter::flat_set, as this has an underlying vector and thus presumably should be able to work well with std::binary_search?
But maybe there is an all together easier way to do this?
Before you answer just use a map where a map ought to be used - I am actually using a vector that is manually sorted (well std::lower_bound) and was thinking of replacing it with boost::containter::flat_set, but it doesn't seem to be easily possible to do so, so I might just stick with the vector.

C++14 will introduce the ability to lookup by a key that does not require the construction of the entire stored object. This can be used as follows:
#include <set>
#include <iostream>
struct StringRef {
StringRef(const std::string& s):x(&s[0]) { }
StringRef(const char *s):x(s) { std::cout << "works: " << s << std::endl; }
const char *x;
};
struct Object {
long long data;
std::size_t index;
};
struct ObjectIndexer {
ObjectIndexer(Object const& o) : index(o.index) {}
ObjectIndexer(std::size_t index) : index(index) {}
std::size_t index;
};
struct ObjComp {
bool operator()(ObjectIndexer a, ObjectIndexer b) const {
return a.index < b.index;
}
typedef void is_transparent; //Allows the comparison with non-Object types.
};
int main() {
std::set<Object, ObjComp> stuff;
stuff.insert(Object{135, 1});
std::cout << stuff.find(ObjectIndexer(1))->data << "\n";
}
More generally, these sorts of problems where there are multiple ways of indexing your data can be solved using Boost.MultiIndex.

Use boost::intrusive::set which can utilize the object's index value directly. It has a find(const KeyType & key, KeyValueCompare comp) function with logarithmic complexity. There are also other set types based on splay trees, AVL trees, scapegoat trees etc. which may perform better depending on your requirements.

If you add the following to your contained object type:
less than operator that only compares the object indices
equality operator that only compares the object indices
a constructor that takes your index type and initializes a dummy object with that value for the index
then you can pass your index type to find, lower_bound, equal_range, etc... and it will act the way you want. When you pass your index to the set's (or flat_set's) find methods it will construct a dummy object of the contained type to use for the comparisons.
Now if your object is really big, or expensive to construct, this might not be the way you want to go.

Determining if an unordered vector<T> has all unique elements

Profiling my cpu-bound code has suggested I that spend a long time checking to see if a container contains completely unique elements. Assuming that I have some large container of unsorted elements (with < and = defined), I have two ideas on how this might be done:
The first using a set:
template <class T>
bool is_unique(vector<T> X) {
set<T> Y(X.begin(), X.end());
return X.size() == Y.size();
}
The second looping over the elements:
template <class T>
bool is_unique2(vector<T> X) {
typename vector<T>::iterator i,j;
for(i=X.begin();i!=X.end();++i) {
for(j=i+1;j!=X.end();++j) {
if(*i == *j) return 0;
}
}
return 1;
}
I've tested them the best I can, and from what I can gather from reading the documentation about STL, the answer is (as usual), it depends. I think that in the first case, if all the elements are unique it is very quick, but if there is a large degeneracy the operation seems to take O(N^2) time. For the nested iterator approach the opposite seems to be true, it is lighting fast if X[0]==X[1] but takes (understandably) O(N^2) time if all the elements are unique.
Is there a better way to do this, perhaps a STL algorithm built for this very purpose? If not, are there any suggestions eek out a bit more efficiency?

Your first example should be O(N log N) as set takes log N time for each insertion. I don't think a faster O is possible.
The second example is obviously O(N^2). The coefficient and memory usage are low, so it might be faster (or even the fastest) in some cases.
It depends what T is, but for generic performance, I'd recommend sorting a vector of pointers to the objects.
template< class T >
bool dereference_less( T const *l, T const *r )
{ return *l < *r; }
template <class T>
bool is_unique(vector<T> const &x) {
vector< T const * > vp;
vp.reserve( x.size() );
for ( size_t i = 0; i < x.size(); ++ i ) vp.push_back( &x[i] );
sort( vp.begin(), vp.end(), ptr_fun( &dereference_less<T> ) ); // O(N log N)
return adjacent_find( vp.begin(), vp.end(),
not2( ptr_fun( &dereference_less<T> ) ) ) // "opposite functor"
== vp.end(); // if no adjacent pair (vp_n,vp_n+1) has *vp_n < *vp_n+1
}
or in STL style,
template <class I>
bool is_unique(I first, I last) {
typedef typename iterator_traits<I>::value_type T;
…
And if you can reorder the original vector, of course,
template <class T>
bool is_unique(vector<T> &x) {
sort( x.begin(), x.end() ); // O(N log N)
return adjacent_find( x.begin(), x.end() ) == x.end();
}

You must sort the vector if you want to quickly determine if it has only unique elements. Otherwise the best you can do is O(n^2) runtime or O(n log n) runtime with O(n) space. I think it's best to write a function that assumes the input is sorted.
template<class Fwd>
bool is_unique(In first, In last)
{
return adjacent_find(first, last) == last;
}
then have the client sort the vector, or a make a sorted copy of the vector. This will open a door for dynamic programming. That is, if the client sorted the vector in the past then they have the option to keep and refer to that sorted vector so they can repeat this operation for O(n) runtime.

The standard library has std::unique, but that would require you to make a copy of the entire container (note that in both of your examples you make a copy of the entire vector as well, since you unnecessarily pass the vector by value).
template <typename T>
bool is_unique(std::vector<T> vec)
{
std::sort(vec.begin(), vec.end());
return std::unique(vec.begin(), vec.end()) == vec.end();
}
Whether this would be faster than using a std::set would, as you know, depend :-).

Is it infeasible to just use a container that provides this "guarantee" from the get-go? Would it be useful to flag a duplicate at the time of insertion rather than at some point in the future? When I've wanted to do something like this, that's the direction I've gone; just using the set as the "primary" container, and maybe building a parallel vector if I needed to maintain the original order, but of course that makes some assumptions about memory and CPU availability...

For one thing you could combine the advantages of both: stop building the set, if you have already discovered a duplicate:
template <class T>
bool is_unique(const std::vector<T>& vec)
{
std::set<T> test;
for (typename std::vector<T>::const_iterator it = vec.begin(); it != vec.end(); ++it) {
if (!test.insert(*it).second) {
return false;
}
}
return true;
}
BTW, Potatoswatter makes a good point that in the generic case you might want to avoid copying T, in which case you might use a std::set<const T*, dereference_less> instead.
You could of course potentially do much better if it wasn't generic. E.g if you had a vector of integers of known range, you could just mark in an array (or even bitset) if an element exists.

You can use std::unique, but it requires the range to be sorted first:
template <class T>
bool is_unique(vector<T> X) {
std::sort(X.begin(), X.end());
return std::unique(X.begin(), X.end()) == X.end();
}
std::unique modifies the sequence and returns an iterator to the end of the unique set, so if that's still the end of the vector then it must be unique.
This runs in nlog(n); the same as your set example. I don't think you can theoretically guarantee to do it faster, although using a C++0x std::unordered_set instead of std::set would do it in expected linear time - but that requires that your elements be hashable as well as having operator == defined, which might not be so easy.
Also, if you're not modifying the vector in your examples, you'd improve performance by passing it by const reference, so you don't make an unnecessary copy of it.

If I may add my own 2 cents.
First of all, as #Potatoswatter remarked, unless your elements are cheap to copy (built-in/small PODs) you'll want to use pointers to the original elements rather than copying them.
Second, there are 2 strategies available.
Simply ensure there is no duplicate inserted in the first place. This means, of course, controlling the insertion, which is generally achieved by creating a dedicated class (with the vector as attribute).
Whenever the property is needed, check for duplicates
I must admit I would lean toward the first. Encapsulation, clear separation of responsibilities and all that.
Anyway, there are a number of ways depending on the requirements. The first question is:
do we have to let the elements in the vector in a particular order or can we "mess" with them ?
If we can mess with them, I would suggest keeping the vector sorted: Loki::AssocVector should get you started.
If not, then we need to keep an index on the structure to ensure this property... wait a minute: Boost.MultiIndex to the rescue ?
Thirdly: as you remarked yourself a simple linear search doubled yield a O(N2) complexity in average which is no good.
If < is already defined, then sorting is obvious, with its O(N log N) complexity.
It might also be worth it to make T Hashable, because a std::tr1::hash_set could yield a better time (I know, you need a RandomAccessIterator, but if T is Hashable then it's easy to have T* Hashable to ;) )
But in the end the real issue here is that our advises are necessary generic because we lack data.
What is T, do you intend the algorithm to be generic ?
What is the number of elements ? 10, 100, 10.000, 1.000.000 ? Because asymptotic complexity is kind of moot when dealing with a few hundreds....
And of course: can you ensure unicity at insertion time ? Can you modify the vector itself ?

Well, your first one should only take N log(N), so it's clearly the better worse case scenario for this application.
However, you should be able to get a better best case if you check as you add things to the set:
template <class T>
bool is_unique3(vector<T> X) {
set<T> Y;
typename vector<T>::const_iterator i;
for(i=X.begin(); i!=X.end(); ++i) {
if (Y.find(*i) != Y.end()) {
return false;
}
Y.insert(*i);
}
return true;
}
This should have O(1) best case, O(N log(N)) worst case, and average case depends on the distribution of the inputs.

If the type T You store in Your vector is large and copying it is costly, consider creating a vector of pointers or iterators to Your vector elements. Sort it based on the element pointed to and then check for uniqueness.
You can also use the std::set for that. The template looks like this
template <class Key,class Traits=less<Key>,class Allocator=allocator<Key> > class set
I think You can provide appropriate Traits parameter and insert raw pointers for speed or implement a simple wrapper class for pointers with < operator.
Don't use the constructor for inserting into the set. Use insert method. The method (one of overloads) has a signature
pair <iterator, bool> insert(const value_type& _Val);
By checking the result (second member) You can often detect the duplicate much quicker, than if You inserted all elements.

In the (very) special case of sorting discrete values with a known, not too big, maximum value N.
You should be able to start a bucket sort and simply check that the number of values in each bucket is below 2.
bool is_unique(const vector<int>& X, int N)
{
vector<int> buckets(N,0);
typename vector<int>::const_iterator i;
for(i = X.begin(); i != X.end(); ++i)
if(++buckets[*i] > 1)
return false;
return true;
}
The complexity of this would be O(n).

Using the current C++ standard containers, you have a good solution in your first example. But if you can use a hash container, you might be able to do better, as the hash set will be nO(1) instead of nO(log n) for a standard set. Of course everything will depend on the size of n and your particular library implementation.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ map set hybrid - c++

Related

Sorting structures inside vector by two criteria in alphabetical order

C++ fixed-capacity associate container

Ordering a container on something else than the key

Replacing std::map with std::set and search by index

Determining if an unordered vector<T> has all unique elements

Categories

Resources