c++ how to shrink std::vector to a subselection efficiently - c++

given two vectors
std::vector<SomeStruct> items; //1'000'000 items
std::vector<int> selection; //900'000 unique indices in ascending order
where selection contains valid indices into items, how can I shrink items efficiently to only contain the elements that are initially indexed by selection?

I am going to write this answer in reverse. Bear with me, I hope you will understand.
Lets first write a wrapper that lets us iterate only selected items:
#include <iostream>
#include <vector>
struct SomeStruct {};
struct selected_item {
std::vector<SomeStruct>& items;
std::vector<size_t>& selection;
struct iterator {
std::vector<SomeStruct>& items;
std::vector<size_t>::iterator selection_iterator;
SomeStruct& operator *(){
return items[*selection_iterator];
}
iterator& operator++(){
++selection_iterator;
return *this;
}
bool operator!=(const iterator& other){
return selection_iterator != other.selection_iterator;
}
};
iterator begin() { return {items,selection.begin()}; }
iterator end() { return {items,selection.end()};}
};
int main() {
std::vector<SomeStruct> items{{},{},{},{}};
std::vector<size_t> selection{1,3};
for (auto& i : selected_item{items,selection}){
std::cout << "item selected\n";
}
}
Using that you can now write a loop that moves selected items from items into a new vector, then move that new vector into items:
int main() {
std::vector<SomeStruct> items{{},{},{},{}};
std::vector<size_t> selection{1,3};
std::vector<SomeStruct> temp_items;
temp_items.reserve(selection.size());
for (auto& i : selected_item{items,selection}){
temp_items.emplace_back(std::move(i));
}
items = std::move(temp_items);
}
Supposed SomeStruct can be moved, this will not copy any SomeStruct. However, also moving is not for free. Depending on why you actually want to remove elements from items (why not populate a vector of selected items in the first place, instead of populating a vector of indices?) you can also consider to skip the moving altogether and use only the above wrapper to do whatever you want to do with the selected items. As 90% of the items are selected, it might be that the savings in memory and more efficient element access (due to a smaller vector) does not outweigh the moving, so you might as well directly do:
int main() {
std::vector<SomeStruct> items{{},{},{},{}};
std::vector<size_t> selection{1,3};
for (auto& i : selected_item{items,selection}){
do_something_with_selected_item(i);
}
}
Another option would be to actually erase elements from items. I did not consider it because I expect it to be rather costly. I might be wrong about that. As always, to know what is more efficient you need to measure.
PS: The wrapper is tested with gcc. I find it a little annoying to write custom iterators, not sure if it needs eg an operator== or a post-increment. I only implemented what was necessary to make gcc happy.

Related

C++ N-last added items container

I try to find optimal data structure for next simple task: class which keeps N last added item values in built-in container. If object obtain N+1 item it should be added at the end of the container and first item should be removed from it. It like a simple queue, but class should have a method GetAverage, and other methods which must have access to every item. Unfortunately, std::queue doesn't have methods begin and end for this purpose.
It's a part of simple class interface:
class StatItem final
{
static int ITEMS_LIMIT;
public:
StatItem() = default;
~StatItem() = default;
void Reset();
void Insert(int val);
int GetAverage() const;
private:
std::queue<int> _items;
};
And part of desired implementation:
void StatItem::Reset()
{
std::queue<int> empty;
std::swap(_items, empty);
}
void StatItem::Insert(int val)
{
_items.push(val);
if (_items.size() == ITEMS_LIMIT)
{
_items.pop();
}
}
int StatItem::GetAverage() const
{
const size_t itemCount{ _items.size() };
if (itemCount == 0) {
return 0;
}
const int sum = std::accumulate(_items.begin(), _items.end(), 0); // Error. std::queue doesn't have this methods
return sum / itemCount;
}
Any ideas?
I'm not sure about std::deque. Does it work effective and should I use it for this task or something different?
P.S.: ITEMS_LIMIT in my case about 100-500 items
The data structure you're looking for is a circular buffer. There is an implementation in the Boost library, however in this situation since it doesn't seem you need to remove items you can easily implement one using a std::vector or std::array.
You will need to keep track of the number of elements in the vector so far so that you can average correctly until you reach the element limit, and also the current insertion index which should just wrap when you reach that limit.
Using an array or vector will allow you to benefit from having a fixed element limit, as the elements will be stored in a single block of memory (good for fast memory access), and with both data structures you can make space for all elements you need on construction.
If you choose to use a std::vector, make sure to use the 'fill' constructor (http://www.cplusplus.com/reference/vector/vector/vector/), which will allow you to create the right number of elements from the beginning and avoid any extra allocations.

Iterating through a vector and deleting contents

Is this a good or standard practice to code like this to loop through a vector while deleting unwanted elements from it without losing performance. If there is a faster way please suggest it.
This vector is of the form std::vector<AnimationState*> activeAnimations;
void AnimationPack::removeDeadAnimations()
{
int counter = 0;
std::remove_if(activeAnimations.begin(), activeAnimations.end(),
[&](AnimationState*& animation) {
if (animation->isActive())
{
counter++;
return true;
}
else
return false;
});
activeAnimations.erase(activeAnimations.end() - counter, activeAnimations.end());
}
Edited version
void AnimationPack::removeDeadAnimations()
{
activeAnimations.erase(std::remove_if(activeAnimations.begin(), activeAnimations.end(),
[&](AnimationState*& animation) {
if (animation->isActive())
return true;
else
return false;
}),activeAnimations.end());
}
Edited Code (As suggested from comments)
void AnimationPack::removeDeadAnimations()
{
activeAnimations.erase(std::remove_if(activeAnimations.begin(), activeAnimations.end(),
[](AnimationState*& animation) { return animation->isActive(); }), activeAnimations.end());
}
Yes, it is called an erase-remove idiom.
Quote from Wikipedia:
The erase–remove idiom is a common C++ technique to eliminate elements
that fulfill a certain criterion from a C++ Standard Library
container.
erase can be used to delete an element from a collection, but for
containers which are based on an array, such as vector, all elements
after the deleted element have to be moved forward, to avoid "gaps" in
the collection.
The algorithm library provides the remove and remove_if algorithms
for this.
These algorithms do not remove elements from the container, but move
all elements that don't fit the remove criteria to the front of the
range, keeping the relative order of the elements. This is done in a
single pass through the data range.
remove returns an iterator pointing to the first of these elements, so
they can be deleted using a single call to erase.
Removes and delete the element from the vector while iterating through it.
void AnimationPack::removeDeadAnimations()
{
activeAnimations.erase(std::remove_if(activeAnimations.begin(), activeAnimations.end(),
[&](AnimationState*& animation) {
if (animation->isActive())
return false;
else
{
delete animation;
return true;
}
}), activeAnimations.end());
}

Ordering output of C++ std::set

I want to list the output of my set in alphabetical order. Below is an attempt at getting to this, but it seems slow / inefficient and I haven't even finished it yet.
void ordered(ostream &os) {
bool inserted = false;
for (objects::iterator i = begin(); i != end(); ) {
for (objects::iterator x = begin(); x != end(); ++x) {
if((**i) < (**x)) { //overloaded and works
os << **i << endl;
inserted = true;
break;
}
}
if(inserted) {
++i;
}
}
}
Clearly this will only output objects that come after the first object alphabetically.
I also considered moving the objects from a set into another container but it still seems inefficient.
The std::set is an ordered container, see reference:
http://en.cppreference.com/w/cpp/container/set
std::set is an associative container that contains a sorted set of
unique objects of type Key. Sorting is done using the key comparison
function Compare. Search, removal, and insertion operations have
logarithmic complexity. Sets are usually implemented as red-black
trees.
std::set is already ordered. It looks like you merely need to use a custom comparer that compares the pointed-to values instead of the pointers themselves (which is the default):
template<typename T> struct pless {
inline bool operator()(const T* a, const T* b) const { return *a < *b; }
};
std::set<Foo*, pless<Foo> > objects;

Queue with unique entries in c++

I need to implement a queue containing unique entries(no duplicates) in C or C++. I am thinking of maintaining a reference of elements already available in queue but that seems very inefficient.
Kindly let me know your suggestions to tackle this.
How about an auxiliary data structure to track uniqueness:
std::queue<Foo> q;
std::set<std::reference_wrapper<Foo>> s;
// to add:
void add(Foo const & x)
{
if (s.find(x) == s.end())
{
q.push_back(x);
s.insert(std::ref(q.back())); // or "s.emplace(q.back());"
}
}
Or, alternatively, reverse the roles of the queue and the set:
std::set<Foo> s;
std::queue<std::reference_wrapper<Foo>> q;
void add(Foo const & x)
{
auto p = s.insert(x); // std::pair<std::set<Foo>::iterator, bool>
if (s.second)
{
q.push_back(std::ref(*s.first)); // or "q.emplace_back(*s.first);"
}
}
queuing:
use std::set to maintain your set of unique elements
add any element that you were able to add to the std::set to the std::queue
dequeueing:
remove element from std::queue and std::set
std::queue is a container adaptor and uses relatively few members of the underlying Container. You can easily implement a custom container that contains both: an unordered_map of reference_wrapper<T> and a deque<T>. It needs at least members front and push_back. Check inside that hash_map when push_back of your container is called and reject accordingly (possibly throw). To give the complete example:
#include <iostream>
#include <set>
#include <deque>
#include <queue>
#include <unordered_set>
#include <functional>
namespace std {
// partial specialization for reference_wrapper
// is this really necessary?
template<typename T>
class hash<std::reference_wrapper<T>> {
public:
std::size_t operator()(std::reference_wrapper<T> x) const
{ return std::hash<T>()(x.get()); }
};
}
template <typename T>
class my_container {
// important: this really needs to be a deque and only front
// insertion/deletion is allowed to not get dangling references
typedef std::deque<T> storage;
typedef std::reference_wrapper<const T> c_ref_w;
typedef std::reference_wrapper<T> ref_w;
public:
typedef typename storage::value_type value_type;
typedef typename storage::reference reference;
typedef typename storage::const_reference const_reference;
typedef typename storage::size_type size_type;
// no move semantics
void push_back(const T& t) {
auto it = lookup_.find(std::cref(t));
if(it != end(lookup_)) {
// is already inserted report error
return;
}
store_.push_back(t);
// this is important to not have dangling references
lookup_.insert(store_.back());
}
// trivial functions
bool empty() const { return store_.empty(); }
const T& front() const { return store_.front(); }
T& front() { return store_.front(); }
void pop_front() { lookup_.erase(store_.front()); store_.pop_front(); }
private:
// look-up mechanism
std::unordered_set<c_ref_w> lookup_;
// underlying storage
storage store_;
};
int main()
{
// reference wrapper for int ends up being silly
// but good for larger objects
std::queue<int, my_container<int>> q;
q.push(2);
q.push(3);
q.push(2);
q.push(4);
while(!q.empty()) {
std::cout << q.front() << std::endl;
q.pop();
}
return 0;
}
EDIT: You will want to make my_container a proper model of container (maybe also allocators), but this is another full question. Thanks to Christian Rau for pointing out bugs.
There is one very important point you've not mentioned in your question, and that is whether your queue of items is sorted or have some kind of ordering (called a Priority queue), or unsorted (called a plain FIFO). The solution you choose will depend only on the answer to this question.
If your queue is unsorted, then maintaining an extra data structure in addition to your queue will be more efficient. Using a second structure which is ordered in some way to maintain the contents of your queue will allow you check if an item already exists in your queue or not much quicker that scanning the queue itself. Adding to the end of an unsorted queue takes constant time and can be done very efficiently.
If your queue must be sorted, then placing the item into the queue requires you to know the item's position in the queue, which requires the queue to be scanned anyway. Once you know an item's position, you know if the item is a duplicate because if it's a duplicate then an item will already exist at that position in the queue. In this case, all work can be performed optimally on the queue itself and maintaining any secondary data structure is unnecessary.
The choice of data structures is up to you. However, for (1) the secondary data structure should not be any kind of list or array, otherwise it will be no more efficient to scan your secondary index as to scan the original queue itself.

std::sort is slow with small amounts of data

I'm finding that std::sort is very slow with sorting only 1000 items.
In class template template <typename T> class TableModel : public QAbstractTableModel I have the following function to sort a table.
template<typename T>
void TableModel<T>::sort(int column, Qt::SortOrder order = Qt::AscendingOrder) {
if(order == Qt::AscendingOrder) {
qSort(m_list.begin(), m_list.end(), less<T>(column));
} else {
qSort(m_list.begin(), m_list.end(), greater<T>(column));
}
reset();
}
I notice if I only have the randomly shuffle my table is shuffles then displays instantly. So this leads me to think that its sort that is slow. Can anyone help me speed up the sorting of a QTable?
Here is the less struct.
template<typename T>
struct less {
int index;
less(int index) : index(index) {}
bool operator()(const T& first, const T& second) {
return T::less(first, second, index);
}
};
T::less is a function and all it does it the less than comparison based on the index given.
Slow is defined as a 5 seconds for only 1000 items when I need to handle about 100,000 items later on.
I suspect that m_list is storing the items by value and that swapping them is expensive. You could try to either implement a faster swap or store them in the container by smart pointer.
Of course a profiler could help you pinpoint the problem much more precisely.
Since m_list is a QList it does not have the same interface or performance characteristics as a normal list. For example, apparently a QList stores an array of T* internally. This representation could be sorted without any copying if the sort algorithm is aware of this implementation detail. By contrast std::sort is probably deep copying the values around, or maybe moving them, which is going to be more work than sorting pointers in the QList array.
It's probably best to use Qt containers with Qt algorithms, since Qt algorithms are more likely to be specialized for Qt containers. Or you could avoid using Qt containers and just stick with the standard library.
Anyway, try using Qt's qSort algorithm:
template<typename T>
void TableModel<T>::sort(int column, Qt::SortOrder order = Qt::AscendingOrder) {
if(order == Qt::AscendingOrder) {
qSort(m_list.begin(), m_list.end(), less<T>(column));
} else {
qSort(m_list.begin(), m_list.end(), greater<T>(column));
}
reset();
}
Original answer
std::sort can't take advantage of the fact that nodes in the list can be moved around without copying the element. Assuming you're using std::list or something similar, use the sort member function.
template<typename T>
void TableModel<T>::sort(int column, Qt::SortOrder order = Qt::AscendingOrder) {
std::random_shuffle(m_list.begin(), m_list.end());
if(order == Qt::AscendingOrder) {
m_list.sort(less<T>(column));
} else {
m_list.sort(greater<T>(column));
}
reset();
}
If you can't do that then you may be able to optimize all those copies by making sure that your elements are move-enabled if you're using C++11.