Queue with unique entries in c++ - c++

I need to implement a queue containing unique entries(no duplicates) in C or C++. I am thinking of maintaining a reference of elements already available in queue but that seems very inefficient.
Kindly let me know your suggestions to tackle this.

How about an auxiliary data structure to track uniqueness:
std::queue<Foo> q;
std::set<std::reference_wrapper<Foo>> s;
// to add:
void add(Foo const & x)
{
if (s.find(x) == s.end())
{
q.push_back(x);
s.insert(std::ref(q.back())); // or "s.emplace(q.back());"
}
}
Or, alternatively, reverse the roles of the queue and the set:
std::set<Foo> s;
std::queue<std::reference_wrapper<Foo>> q;
void add(Foo const & x)
{
auto p = s.insert(x); // std::pair<std::set<Foo>::iterator, bool>
if (s.second)
{
q.push_back(std::ref(*s.first)); // or "q.emplace_back(*s.first);"
}
}

queuing:
use std::set to maintain your set of unique elements
add any element that you were able to add to the std::set to the std::queue
dequeueing:
remove element from std::queue and std::set

std::queue is a container adaptor and uses relatively few members of the underlying Container. You can easily implement a custom container that contains both: an unordered_map of reference_wrapper<T> and a deque<T>. It needs at least members front and push_back. Check inside that hash_map when push_back of your container is called and reject accordingly (possibly throw). To give the complete example:
#include <iostream>
#include <set>
#include <deque>
#include <queue>
#include <unordered_set>
#include <functional>
namespace std {
// partial specialization for reference_wrapper
// is this really necessary?
template<typename T>
class hash<std::reference_wrapper<T>> {
public:
std::size_t operator()(std::reference_wrapper<T> x) const
{ return std::hash<T>()(x.get()); }
};
}
template <typename T>
class my_container {
// important: this really needs to be a deque and only front
// insertion/deletion is allowed to not get dangling references
typedef std::deque<T> storage;
typedef std::reference_wrapper<const T> c_ref_w;
typedef std::reference_wrapper<T> ref_w;
public:
typedef typename storage::value_type value_type;
typedef typename storage::reference reference;
typedef typename storage::const_reference const_reference;
typedef typename storage::size_type size_type;
// no move semantics
void push_back(const T& t) {
auto it = lookup_.find(std::cref(t));
if(it != end(lookup_)) {
// is already inserted report error
return;
}
store_.push_back(t);
// this is important to not have dangling references
lookup_.insert(store_.back());
}
// trivial functions
bool empty() const { return store_.empty(); }
const T& front() const { return store_.front(); }
T& front() { return store_.front(); }
void pop_front() { lookup_.erase(store_.front()); store_.pop_front(); }
private:
// look-up mechanism
std::unordered_set<c_ref_w> lookup_;
// underlying storage
storage store_;
};
int main()
{
// reference wrapper for int ends up being silly
// but good for larger objects
std::queue<int, my_container<int>> q;
q.push(2);
q.push(3);
q.push(2);
q.push(4);
while(!q.empty()) {
std::cout << q.front() << std::endl;
q.pop();
}
return 0;
}
EDIT: You will want to make my_container a proper model of container (maybe also allocators), but this is another full question. Thanks to Christian Rau for pointing out bugs.

There is one very important point you've not mentioned in your question, and that is whether your queue of items is sorted or have some kind of ordering (called a Priority queue), or unsorted (called a plain FIFO). The solution you choose will depend only on the answer to this question.
If your queue is unsorted, then maintaining an extra data structure in addition to your queue will be more efficient. Using a second structure which is ordered in some way to maintain the contents of your queue will allow you check if an item already exists in your queue or not much quicker that scanning the queue itself. Adding to the end of an unsorted queue takes constant time and can be done very efficiently.
If your queue must be sorted, then placing the item into the queue requires you to know the item's position in the queue, which requires the queue to be scanned anyway. Once you know an item's position, you know if the item is a duplicate because if it's a duplicate then an item will already exist at that position in the queue. In this case, all work can be performed optimally on the queue itself and maintaining any secondary data structure is unnecessary.
The choice of data structures is up to you. However, for (1) the secondary data structure should not be any kind of list or array, otherwise it will be no more efficient to scan your secondary index as to scan the original queue itself.

Related

c++ how to shrink std::vector to a subselection efficiently

given two vectors
std::vector<SomeStruct> items; //1'000'000 items
std::vector<int> selection; //900'000 unique indices in ascending order
where selection contains valid indices into items, how can I shrink items efficiently to only contain the elements that are initially indexed by selection?
I am going to write this answer in reverse. Bear with me, I hope you will understand.
Lets first write a wrapper that lets us iterate only selected items:
#include <iostream>
#include <vector>
struct SomeStruct {};
struct selected_item {
std::vector<SomeStruct>& items;
std::vector<size_t>& selection;
struct iterator {
std::vector<SomeStruct>& items;
std::vector<size_t>::iterator selection_iterator;
SomeStruct& operator *(){
return items[*selection_iterator];
}
iterator& operator++(){
++selection_iterator;
return *this;
}
bool operator!=(const iterator& other){
return selection_iterator != other.selection_iterator;
}
};
iterator begin() { return {items,selection.begin()}; }
iterator end() { return {items,selection.end()};}
};
int main() {
std::vector<SomeStruct> items{{},{},{},{}};
std::vector<size_t> selection{1,3};
for (auto& i : selected_item{items,selection}){
std::cout << "item selected\n";
}
}
Using that you can now write a loop that moves selected items from items into a new vector, then move that new vector into items:
int main() {
std::vector<SomeStruct> items{{},{},{},{}};
std::vector<size_t> selection{1,3};
std::vector<SomeStruct> temp_items;
temp_items.reserve(selection.size());
for (auto& i : selected_item{items,selection}){
temp_items.emplace_back(std::move(i));
}
items = std::move(temp_items);
}
Supposed SomeStruct can be moved, this will not copy any SomeStruct. However, also moving is not for free. Depending on why you actually want to remove elements from items (why not populate a vector of selected items in the first place, instead of populating a vector of indices?) you can also consider to skip the moving altogether and use only the above wrapper to do whatever you want to do with the selected items. As 90% of the items are selected, it might be that the savings in memory and more efficient element access (due to a smaller vector) does not outweigh the moving, so you might as well directly do:
int main() {
std::vector<SomeStruct> items{{},{},{},{}};
std::vector<size_t> selection{1,3};
for (auto& i : selected_item{items,selection}){
do_something_with_selected_item(i);
}
}
Another option would be to actually erase elements from items. I did not consider it because I expect it to be rather costly. I might be wrong about that. As always, to know what is more efficient you need to measure.
PS: The wrapper is tested with gcc. I find it a little annoying to write custom iterators, not sure if it needs eg an operator== or a post-increment. I only implemented what was necessary to make gcc happy.

C++ N-last added items container

I try to find optimal data structure for next simple task: class which keeps N last added item values in built-in container. If object obtain N+1 item it should be added at the end of the container and first item should be removed from it. It like a simple queue, but class should have a method GetAverage, and other methods which must have access to every item. Unfortunately, std::queue doesn't have methods begin and end for this purpose.
It's a part of simple class interface:
class StatItem final
{
static int ITEMS_LIMIT;
public:
StatItem() = default;
~StatItem() = default;
void Reset();
void Insert(int val);
int GetAverage() const;
private:
std::queue<int> _items;
};
And part of desired implementation:
void StatItem::Reset()
{
std::queue<int> empty;
std::swap(_items, empty);
}
void StatItem::Insert(int val)
{
_items.push(val);
if (_items.size() == ITEMS_LIMIT)
{
_items.pop();
}
}
int StatItem::GetAverage() const
{
const size_t itemCount{ _items.size() };
if (itemCount == 0) {
return 0;
}
const int sum = std::accumulate(_items.begin(), _items.end(), 0); // Error. std::queue doesn't have this methods
return sum / itemCount;
}
Any ideas?
I'm not sure about std::deque. Does it work effective and should I use it for this task or something different?
P.S.: ITEMS_LIMIT in my case about 100-500 items
The data structure you're looking for is a circular buffer. There is an implementation in the Boost library, however in this situation since it doesn't seem you need to remove items you can easily implement one using a std::vector or std::array.
You will need to keep track of the number of elements in the vector so far so that you can average correctly until you reach the element limit, and also the current insertion index which should just wrap when you reach that limit.
Using an array or vector will allow you to benefit from having a fixed element limit, as the elements will be stored in a single block of memory (good for fast memory access), and with both data structures you can make space for all elements you need on construction.
If you choose to use a std::vector, make sure to use the 'fill' constructor (http://www.cplusplus.com/reference/vector/vector/vector/), which will allow you to create the right number of elements from the beginning and avoid any extra allocations.

C++ insert an element to a vector

I am trying to build a priority queue using a vector that stores each element. Firstly, I wanna insert the element to the vector with its priority. I am not sure if it is possible, if not, Can someone give me another solution.
Here is my code:
template <typename E>
class PriorityQueue {
private:
std::vector<E> elements;
E value;
int pr;
public:
PriorityQueue() {}
void insert(int priority, E element) {
}
};
Here is how to create an element with priority for vector:
struct PriElement
{
int data;
int pri;
bool operator < ( const PriElement & other ) const
{
return pri < other.pri;
}
};
vector<PriElement> _vector;
However, the real problem is to keep the vector sorted per priority.
Here is a naive implementation showing the bubble up method:
class PriorityQueue{
public:
void insert( int data, int pri )
{
_vector.push_back(PriElement(data,pri));
int index = _vector.size() -1;
while ( ( index > 0 )&& (_vector[index] < _vector[index-1] ) )
{
swap(_vector[index],_vector[index-1]);
index--;
}
}
private:
vector<PriElement> _vector;
};
For any real world implementation, as mentioned, use priority_queue.
The standard algorithm (see Introduction To Algorithms chapter 6) for doing this is as follows:
When pushing an item, insert it to the end of the vector, then "bubble" it up to the correct place.
When popping the smallest item, replace the first item (at position 0) with the the item at the end, then "bubble" it down to the correct place.
It's possible to show that this can be done with (amortized) logarithmic time (the amortization is due to the vector possibly doubling itself).
However, there is no need to implement this yourself, as the standard library contains std::priority_queue which is a container adapter using std::vector as its default sequence container. For example, if you define
std::priority_queue<int> q;
then q will be a priority queue adapting a vector.

How to implement O(1) deletion on min-heap with hashtable

Read the following statement somewhere:
An additional hash table can be used to make deletion fast in
min-heap.
Question> How to combine priority_queue and unordered_map so that I can implement the idea above?
#include <queue>
#include <unordered_map>
#include <iostream>
#include <list>
using namespace std;
struct Age
{
Age(int age) : m_age(age) {}
int m_age;
};
// Hash function for Age
class HashAge {
public:
const size_t operator()(const Age &a) const {
return hash<int>()(a.m_age);
}
};
struct AgeGreater
{
bool operator()(const Age& lhs, const Age& rhs) const {
return lhs.m_age < rhs.m_age;
}
};
int main()
{
priority_queue<Age, list<Age>, AgeGreater> min_heap; // doesn't work
//priority_queue<Age, vector<Age>, AgeGreater> min_heap;
// Is this the right way to do it?
unordered_map<Age, list<Age>::iterator, HashAge > hashTable;
}
Question> I am not able to make the following work:
priority_queue<Age, list<Age>, AgeGreater> min_heap; // doesn't work
I have to use list as the container b/c the iterators of list is not affected by insertion/deletion (Iterator invalidation rules)
You can't do this with the supplied priority_queue data structure:
In a priority queue you don't know where the elements are stored, so it is hard to delete them in constant time, because you can't find the elements. But, if you maintain a hash table with the location of every element in the priority queue stored in the hash table, then you can find and remove an item quickly, although I would expect log(N) time in the worst case, not constant time. (I don't recall offhand if you get amortized constant time.)
To do this you usually need to roll your own data structures, because you have to update the hash table each time an item is moved around in the priority queue.
I have some example code that does this here:
http://code.google.com/p/hog2/source/browse/trunk/algorithms/AStarOpenClosed.h
It's based on older coding styles, but it does the job.
To illustrate:
/**
* Moves a node up the heap. Returns true if the node was moved, false otherwise.
*/
template<typename state, typename CmpKey, class dataStructure>
bool AStarOpenClosed<state, CmpKey, dataStructure>::HeapifyUp(unsigned int index)
{
if (index == 0) return false;
int parent = (index-1)/2;
CmpKey compare;
if (compare(elements[theHeap[parent]], elements[theHeap[index]]))
{
// Perform normal heap operations
unsigned int tmp = theHeap[parent];
theHeap[parent] = theHeap[index];
theHeap[index] = tmp;
// Update the element location in the hash table
elements[theHeap[parent]].openLocation = parent;
elements[theHeap[index]].openLocation = index;
HeapifyUp(parent);
return true;
}
return false;
}
Inside the if statement we do the normal heapify operations on the heap and then update the location in the hash table (openLocation) to point to the current location in the priority queue.

Remove item from std::list with only having access to the iterator

std::list is a double linked list. Doesn't that mean that it should be possible to remove an item from a list by only having access to the iterator?
Maybe my question wasn't clear enough.
#pragma once
#include <list>
typedef std::list<int> IntList ;
typedef IntList::iterator IntIterator;
class IntHiddenList
{
private:
IntList list;
public:
IntIterator AddInt(int x)
{
list.push_front(x);
return list.begin();
}
};
int main()
{
IntHiddenList a;
IntIterator it = a.AddInt(5);
// How would I go about deleting 5 from the list using only "it"?
}
Yes, notionally it's possible. However, the standard library does not allow it (it requires the container and iterator to erase).
However you're in luck: boost provides the boost::instrusive (http://www.boost.org/doc/libs/1_54_0/doc/html/intrusive/list.html) capability to do exactly what you want.
No, you will still need the list in order to delete an element.
In STL, iterator only holds the pointer to the data, and provides operations to move through the container. You can see good table description here.