C++: 'unique vector' data structure - c++

I need a data structure like std::vector or std::list whose elements will be unique. In most of time I will call push_back on it, sometimes maybe erase. When I insert an element which is already there, I need to be notified either by some boolean or exception.
And the most important property it should have: the order of insertions. Each time I iterate over it, it should return elements in the order they were inserted.
We can think other way: a queue which guarantees the uniqueness of elements. But I don't want to pop elements, instead I want to iterate over them just like we do for vector or list.
What is the best data structure for my needs?

You can use a std::set
It will return a pair pair<iterator,bool> when the insert method is called. The bool in the pair is false when the element already exists in the set (the element won't be added in that case).

Use a struct with a regular std::vector and a std::set.
When you push, check the set for existence of the element. When you need to iterate, iterate over the vector. If you need to erase from the vector, also erase from the set.
Basically, use the set as an aside, only for fast "presence of an element" check.
// either make your class a template or use a fixed type of element
class unique_vector
{
public:
// implement the various operator you need like operator[]
// alternatively, consider inheriting from std::vector
private:
std::set<T> m_set; // fast lookup for existence of elements
std::vector<T> m_vector; // vector of elements
};

I would prefer using std::unordered_set to stores existing elements in a std::vector and it has faster lookup time of O(1), while the lookup time of std::set is O(logn).

You can use Boost.MultiIndex for this:
Live On Coliru
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/sequenced_index.hpp>
#include <boost/multi_index/hashed_index.hpp>
#include <boost/multi_index/identity.hpp>
using namespace boost::multi_index;
template<typename T>
using unique_list=multi_index_container<
T,
indexed_by<
sequenced<>,
hashed_unique<identity<T>>
>
>;
#include <iostream>
int main()
{
unique_list<int> l;
auto print=[&](){
const char* comma="";
for(const auto& x:l){
std::cout<<comma<<x;
comma=",";
}
std::cout<<"\n";
};
l.push_back(0);
l.push_back(1);
l.push_back(2);
l.push_back(0);
l.push_back(2);
l.push_back(4);
print();
}
Output
0,1,2,4

Related

Store selected fields from an unordered set on struct to a vector

I have an unordered_set that stores the following struct
struct match_t{
size_t score;
size_t ci;
};
typedef std::unordered_set<match_t> uniq_t;
Now I want to store the elements of uniq_t myset; to a vector, but in doing so, I want to copy just the score and not the entire struct. I have seen solutions for assigning the elements using assign or back_inserter. I was wondering how to select just the required fields from the struct. I don't see any parameter in assign or back_inserter for this purpose.
Should I try overriding push_back method for the vector or are there other methods for doing this?
EDIT 1
Do I get any performance improvements by using any of these methods instead of looping over the set and assigning the required values?
There is nothing wrong a simple for loop:
std::unordered_set<match_t> myset;
std::vector<std::size_t> myvec;
myvec.reserve(myset.size()); // allocate memory only once
for (const auto& entry : myset)
myvec.push_back(entry.score);
Alternatively, you could use std::transform with a custom lambda:
#include <algorithm>
std::tranform(myset.cbegin(), myset.cend(), std::back_inserter(myvec),
[](const auto& entry){ return entry.score; });
Another way is to use a range library, e.g. with range-v3
#include <range/v3/view/transform.hpp>
std::vector<std::size_t> myvec = myset | ranges::view::transform(&match_t::score);
Performance-wise, you can't do anything about the linear pass over all match_t objects. The important tweak instead is to minimize the number of allocations. As the size of the resulting std::vector is known a priori, a call to std::vector::reserve as shown above makes sure that no unnecessary allocation occur.

Container with key and sorting criteria separate

I want to have a collection of items which are searchable based on a key (an unsigned value), but I want the elements to be sorted based on a different criteria i.e. the last accessed time (Which is part of the value).
How can I achieve this in C++? I can sort them separately on demand, but can I create the container itself such that sorting happens automatically?
Are there ready made containers (in boost) that can have similar feature built into them?
You could probably implement something of this kind, using std::list and std::unordered_map pointing to each other.
#include <list>
#include <unordered_map>
template <typename A>
struct Cache {
using key = unsigned;
struct Composite {
Composite(A &_a, std::list<key>::iterator _it) : a(_a), it(_it) {}
A &a;
std::list<key>::iterator it;
};
std::unordered_map<key, Composite> map;
std::list <key> list;
void insert(key k, A &a) { // Assuming inserting contains accessing
list.emplace_front(k);
map[k] = Composite(a, list.front());
}
A &operator[](key k) {
list.erase(map[k].it);
list.emplace_front(k);
return map[k].a;
}
A &last_accessed() { // or whatever else you wish to implement
assert(!list.empty());
return map[list.front()].a;
}
};
This solution is optimized for keeping track of which element was accessed last. If you want to sort given a different attribute, you can follow a similar process but use an std::set to store the values with your comparison function, and then iterators to that from an std::unordered_map hashed with a key of your choice.

Comparing unordered_map vs unordered_set

First of all, what is the main difference between them?
The only thing i've found is that unordered_set has no operator [].
How should i access an element in unordered_set, since there is no []?
Which container is using random access to memory(or both)?
And which one of them faster in any sense or using less memory?
They are nearly identical. unordered_set only contains keys, and no values. There is no mapping from a key to a value, so no need for an operator[]. unordered_map maps a key to a value.
You can use the various find methods within unordered_set to locate things.
you can use iterators to access elements.
unordered_set <string> u{
"Dog",
"Cat",
"Rat",
"Parrot",
"bee"
};
for(auto& s:u){
cout << s << ' ';
}
unordered_set<string>::const_iterator point = u.find("bee");
How should I access an element in unordered_set (C++17)?
In C++ 17 a new function extract is added to unordered_set.
Specially, this is the only way to take move only object out of the set.
https://en.cppreference.com/w/cpp/container/unordered_set/extract
For example if you want third element of your unordered set.
Advance the iterator
std::advance(it,2);
Then extarct the value
s.extract(it).value();
Here is the complete code. try on any C++17 compiler.
#include <iostream>
#include <string>
#include <unordered_set>
#include <iterator>
int main()
{
//CREATE AN OBJECT
std::unordered_set<std::string> s;
//INSERT DATA
s.insert("aee");
s.insert("bee");
s.insert("cee");
s.insert("dee");
//NEED TO INCLUDE "iterator" HEADER TO USE "std::advance"
auto it = s.begin();
std::advance(it,2);
//USING EXTRACT
std::string sval = s.extract(it).value();
std::cout<<sval;
}
Note: if queried for out of bound index, nothing happens. No result.
Try changing your code
//ONLY FOUR ELEMENTS
std::advance(it,8);
//USING EXTRACT
std::string sval = s.extract(it).value();

about erasing an interator of a boost multiindex

I'd like to delete some element out of a boost multi-index container by erasing iterators while visiting the collection.
What I am not sure about is if any iterator invalidation is involved and whether my code below would invalidate firstand last iterators.
If the code below is incorrect, which is the best way considering the specific index (ordered_unique) below?
#include <iostream>
#include <stdint.h>
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/key_extractors.hpp>
#include <boost/shared_ptr.hpp>
using namespace std;
class MyClass{
public:
MyClass(int32_t id) : id_(id) {}
int32_t id() const
{ return id_; }
private:
int32_t id_;
};
typedef boost::shared_ptr<MyClass> MyClass_ptr;
typedef boost::multi_index_container<
MyClass_ptr,
boost::multi_index::indexed_by<
boost::multi_index::ordered_unique<
boost::multi_index::const_mem_fun<MyClass,int32_t,&MyClass::id>
>
>
> Coll;
int main() {
Coll coll;
// ..insert some entries 'coll.insert(MyClass_ptr(new MyClass(12)));'
Coll::iterator first = coll.begin();
Coll::iterator last = coll.end();
while(first != last) {
if((*first)->id() == 3)
coll.erase(first++);
else
++first;
}
}
The reason that erase for containers returns an iterator is to use that result:
first = coll.erase(first);
Then you don't have to worry about how the underlying implementation handles erase or whether it shifts elements around. (In a vector, for instance, your code would've skipped an element in your iteration) However, the documentation does state that:
It is tempting to see random access indices as an analogue of std::vector for use in Boost.MultiIndex, but this metaphor can be misleading, as both constructs, though similar in many respects, show important semantic differences. An advantage of random access indices is that their iterators, as well as references to their elements, are stable, that is, they remain valid after any insertions or deletions.
Still, just seeing coll.erase(first++) is a flag for me, so prefer to do it the other way.

Removing items from a vector

I'm looking for the most efficient way to remove multiple items from a vector?
Basically I will be searching for a flag within the vector and removing and objects that have that flag.
However, I have heard that erasing an object from a vector will mess up your iterators, so what is the most efficient way to loop though a vector (containing potentially thousands of objects) and remove those with a specific flag?
I am hoping to not have to loop through the vector multiple times.
If there are multiple elements match the flag you should use std::remove_if():
vec.erase(std::remove_if(vec.begin(), v.end(), [](T const& e){ return e.flag(); }),
v.end());
Using this approach moves each vector element at most once. Removing individual elements may move each element O(n) times.
The std::remove_if algorithm can sometimes be coupled elegantly with other utilities. For example, if your class looks like this:
struct Foo
{
bool flag; // either this...
bool get_flag() const; // ... or this
// ...
};
Then you can use std::mem_fn to generate an accessor functor that returns the value of the member or invokes the member function, respectively:
std::mem_fn(&Foo::flag)
std::mem_fn(&Foo::get_flag)
Finally, you can use argument-dependent lookup to rely on namespace std to be found as soon as one of the argument types is from that namespace. For example:
#include <algorithm> // for remove_if
#include <functional> // for mem_fn
#include <iterator> // for begin, end
#include <vector> // for vector
std::vector<Foo> v = /* something */ ;
v.erase(remove_if(begin(v), end(v), std::mem_fn(&Foo::flag)), end(v));