std::sort is slow with small amounts of data

std::sort is slow with small amounts of data - c++

I'm finding that std::sort is very slow with sorting only 1000 items.
In class template template <typename T> class TableModel : public QAbstractTableModel I have the following function to sort a table.
template<typename T>
void TableModel<T>::sort(int column, Qt::SortOrder order = Qt::AscendingOrder) {
if(order == Qt::AscendingOrder) {
qSort(m_list.begin(), m_list.end(), less<T>(column));
} else {
qSort(m_list.begin(), m_list.end(), greater<T>(column));
}
reset();
}
I notice if I only have the randomly shuffle my table is shuffles then displays instantly. So this leads me to think that its sort that is slow. Can anyone help me speed up the sorting of a QTable?
Here is the less struct.
template<typename T>
struct less {
int index;
less(int index) : index(index) {}
bool operator()(const T& first, const T& second) {
return T::less(first, second, index);
}
};
T::less is a function and all it does it the less than comparison based on the index given.
Slow is defined as a 5 seconds for only 1000 items when I need to handle about 100,000 items later on.

I suspect that m_list is storing the items by value and that swapping them is expensive. You could try to either implement a faster swap or store them in the container by smart pointer.
Of course a profiler could help you pinpoint the problem much more precisely.

Since m_list is a QList it does not have the same interface or performance characteristics as a normal list. For example, apparently a QList stores an array of T* internally. This representation could be sorted without any copying if the sort algorithm is aware of this implementation detail. By contrast std::sort is probably deep copying the values around, or maybe moving them, which is going to be more work than sorting pointers in the QList array.
It's probably best to use Qt containers with Qt algorithms, since Qt algorithms are more likely to be specialized for Qt containers. Or you could avoid using Qt containers and just stick with the standard library.
Anyway, try using Qt's qSort algorithm:
template<typename T>
void TableModel<T>::sort(int column, Qt::SortOrder order = Qt::AscendingOrder) {
if(order == Qt::AscendingOrder) {
qSort(m_list.begin(), m_list.end(), less<T>(column));
} else {
qSort(m_list.begin(), m_list.end(), greater<T>(column));
}
reset();
}
Original answer
std::sort can't take advantage of the fact that nodes in the list can be moved around without copying the element. Assuming you're using std::list or something similar, use the sort member function.
template<typename T>
void TableModel<T>::sort(int column, Qt::SortOrder order = Qt::AscendingOrder) {
std::random_shuffle(m_list.begin(), m_list.end());
if(order == Qt::AscendingOrder) {
m_list.sort(less<T>(column));
} else {
m_list.sort(greater<T>(column));
}
reset();
}
If you can't do that then you may be able to optimize all those copies by making sure that your elements are move-enabled if you're using C++11.

Related

c++ how to shrink std::vector to a subselection efficiently

given two vectors
std::vector<SomeStruct> items; //1'000'000 items
std::vector<int> selection; //900'000 unique indices in ascending order
where selection contains valid indices into items, how can I shrink items efficiently to only contain the elements that are initially indexed by selection?

I am going to write this answer in reverse. Bear with me, I hope you will understand.
Lets first write a wrapper that lets us iterate only selected items:
#include <iostream>
#include <vector>
struct SomeStruct {};
struct selected_item {
std::vector<SomeStruct>& items;
std::vector<size_t>& selection;
struct iterator {
std::vector<SomeStruct>& items;
std::vector<size_t>::iterator selection_iterator;
SomeStruct& operator *(){
return items[*selection_iterator];
}
iterator& operator++(){
++selection_iterator;
return *this;
}
bool operator!=(const iterator& other){
return selection_iterator != other.selection_iterator;
}
};
iterator begin() { return {items,selection.begin()}; }
iterator end() { return {items,selection.end()};}
};
int main() {
std::vector<SomeStruct> items{{},{},{},{}};
std::vector<size_t> selection{1,3};
for (auto& i : selected_item{items,selection}){
std::cout << "item selected\n";
}
}
Using that you can now write a loop that moves selected items from items into a new vector, then move that new vector into items:
int main() {
std::vector<SomeStruct> items{{},{},{},{}};
std::vector<size_t> selection{1,3};
std::vector<SomeStruct> temp_items;
temp_items.reserve(selection.size());
for (auto& i : selected_item{items,selection}){
temp_items.emplace_back(std::move(i));
}
items = std::move(temp_items);
}
Supposed SomeStruct can be moved, this will not copy any SomeStruct. However, also moving is not for free. Depending on why you actually want to remove elements from items (why not populate a vector of selected items in the first place, instead of populating a vector of indices?) you can also consider to skip the moving altogether and use only the above wrapper to do whatever you want to do with the selected items. As 90% of the items are selected, it might be that the savings in memory and more efficient element access (due to a smaller vector) does not outweigh the moving, so you might as well directly do:
int main() {
std::vector<SomeStruct> items{{},{},{},{}};
std::vector<size_t> selection{1,3};
for (auto& i : selected_item{items,selection}){
do_something_with_selected_item(i);
}
}
Another option would be to actually erase elements from items. I did not consider it because I expect it to be rather costly. I might be wrong about that. As always, to know what is more efficient you need to measure.
PS: The wrapper is tested with gcc. I find it a little annoying to write custom iterators, not sure if it needs eg an operator== or a post-increment. I only implemented what was necessary to make gcc happy.

Deleting element in priority queue other than top element in C++

Is there any inbuilt function for deleting a given element (other than top element) in priority queue class of C++ STL? If not how to delete it in O(log n)?Should i implement the heap data structure from scratch for this 'delete' functionality?

Is there any inbuilt function for deleting a given element (other than top element) in priority queue class of C++ STL?
No.
If not how to delete it in O(log n)?
By using another container. std::set is the simplest compromise. A custom heap implementation may be more optimal.

There is no inbuilt function for deleting a given element(other than top element) in priority queue.
I would recommend you to use std::set which performs the operations in O(logN) by implementing binary tree. But in case you need more better time complexity use std::unordered_set which performs operations in O(1) time and uses hashing.
So my advice will be that use std::set or std::unordered_set & don't restrict yourself to priority queue only.

As suggested by this solution, you can do something like this:
template<typename T>
class custom_priority_queue : public std::priority_queue<T, std::vector<T>>
{
public:
template< typename UnaryPredicate >
T pop_match_or_top(UnaryPredicate p) {
auto it = std::find_if(this->c.begin(), this->c.end(), p);
if (it != this->c.end()) {
T value = std::move(*it);
this->c.erase(it);
std::make_heap(this->c.begin(), this->c.end(), this->comp);
return value;
}
else {
T value = this->top();
this->pop();
return value;
}
}
};
This is specially useful when you need to take elements that are close to the top but are not exactly the top.

C++ N-last added items container

I try to find optimal data structure for next simple task: class which keeps N last added item values in built-in container. If object obtain N+1 item it should be added at the end of the container and first item should be removed from it. It like a simple queue, but class should have a method GetAverage, and other methods which must have access to every item. Unfortunately, std::queue doesn't have methods begin and end for this purpose.
It's a part of simple class interface:
class StatItem final
{
static int ITEMS_LIMIT;
public:
StatItem() = default;
~StatItem() = default;
void Reset();
void Insert(int val);
int GetAverage() const;
private:
std::queue<int> _items;
};
And part of desired implementation:
void StatItem::Reset()
{
std::queue<int> empty;
std::swap(_items, empty);
}
void StatItem::Insert(int val)
{
_items.push(val);
if (_items.size() == ITEMS_LIMIT)
{
_items.pop();
}
}
int StatItem::GetAverage() const
{
const size_t itemCount{ _items.size() };
if (itemCount == 0) {
return 0;
}
const int sum = std::accumulate(_items.begin(), _items.end(), 0); // Error. std::queue doesn't have this methods
return sum / itemCount;
}
Any ideas?
I'm not sure about std::deque. Does it work effective and should I use it for this task or something different?
P.S.: ITEMS_LIMIT in my case about 100-500 items

The data structure you're looking for is a circular buffer. There is an implementation in the Boost library, however in this situation since it doesn't seem you need to remove items you can easily implement one using a std::vector or std::array.
You will need to keep track of the number of elements in the vector so far so that you can average correctly until you reach the element limit, and also the current insertion index which should just wrap when you reach that limit.
Using an array or vector will allow you to benefit from having a fixed element limit, as the elements will be stored in a single block of memory (good for fast memory access), and with both data structures you can make space for all elements you need on construction.
If you choose to use a std::vector, make sure to use the 'fill' constructor (http://www.cplusplus.com/reference/vector/vector/vector/), which will allow you to create the right number of elements from the beginning and avoid any extra allocations.

Ordering output of C++ std::set

I want to list the output of my set in alphabetical order. Below is an attempt at getting to this, but it seems slow / inefficient and I haven't even finished it yet.
void ordered(ostream &os) {
bool inserted = false;
for (objects::iterator i = begin(); i != end(); ) {
for (objects::iterator x = begin(); x != end(); ++x) {
if((**i) < (**x)) { //overloaded and works
os << **i << endl;
inserted = true;
break;
}
}
if(inserted) {
++i;
}
}
}
Clearly this will only output objects that come after the first object alphabetically.
I also considered moving the objects from a set into another container but it still seems inefficient.

The std::set is an ordered container, see reference:
http://en.cppreference.com/w/cpp/container/set
std::set is an associative container that contains a sorted set of
unique objects of type Key. Sorting is done using the key comparison
function Compare. Search, removal, and insertion operations have
logarithmic complexity. Sets are usually implemented as red-black
trees.

std::set is already ordered. It looks like you merely need to use a custom comparer that compares the pointed-to values instead of the pointers themselves (which is the default):
template<typename T> struct pless {
inline bool operator()(const T* a, const T* b) const { return *a < *b; }
};
std::set<Foo*, pless<Foo> > objects;

Overriding QAbstractItemModel::index and accessing std::map

In my program I want to use view/model pattern with view = QListView and my own model which I subclassed from QAbstractListModel. My data class looks like
class Avtomat
{
...
map<QString, State *> states;
...
};
In my model class
class AvtomatModel : public QAbstractListModel
{
...
Avtomat a;
...
};
I'm trying to overload QAbstractItemView::index function so that I'm able to provide interface for editing data map.
As index function takes int row argument I solved that problem by providing the following
State* Avtomat::pStateFromIndex(int index) const
{
map<QString, State *>::const_iterator i;
int count = 0;
for (i = states.begin(); i != states.end() && count != index; ++i)
++count;
return (*i).second;
}
so in my index function I do like this
return createIndex(row, column, a.pStateFromIndex(row));
but that seems pretty ugly because I have O(n). Can you help me to design a better way to access my map using int index?

This is a fundamental data modelling issue. What's the primary way you need to retrieve your data? By key or by index?
If you only ever access it by index (including in the model) then you're simply using an inappropriate data structure and should switch to something else like a list.
If you do need to query by key too then you have several options. There's nothing wrong with what you're doing already if efficiency isn't a huge driver, especially if the data set is small. Alternatively you could also maintain both key and index mappings to your underlying data. This is a simple and effective solution but it means you have to take the hit of managing consistency between the two and has a memory overhead which may be significant if your data set is large. Or you could use a data structure that provides access by both key and index directly. Ultimately it depends on your specific circumstances and the data domain you're working with.
There's a good summary of the Qt container classes (along with the std containers) in the documentation. The section on algorithmic complexity may be particularly interesting to you.

The other option is to use a vector to hold the data in key-value pairs. The vector can then be accessed by index or by key.
Disadvantage of this is that inserting into a vector is expensive relative to a std::map.
typedef std::pair<QString, State*> StateP;
typedef std::vector<StateP> States;
States states;
Then maintain the vector in sorted order based on a predicate that compares the first element. You can the lookup items by index in O(1) or key in O(log n).
struct StatePCompare {
bool operator()(StateP const& lhs, StateP const& rhs) const {
return (lhs.first < rhs.first);
}
};
void Avtomat::insert(QString key, State* state)
{
States::iterator i = std::lower_bound(states.begin(), states.end(), StatePCompare());
if ((i != states.end() && (i->first == key)) {
// key already exists, set the element
i->second = state;
}
else {
states.insert(i, state);
}
}
State* Avtomat::find(QString key)
{
States::iterator i = std::lower_bound(states.begin(), states.end(), StatePCompare());
if ((i != states.end() && (i->first == key)) {
return i->second;
}
return NULL;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

std::sort is slow with small amounts of data - c++

I suspect that m_list is storing the items by value and that swapping them is expensive. You could try to either implement a faster swap or store them in the container by smart pointer. Of course a profiler could help you pinpoint the problem much more precisely.

Related

c++ how to shrink std::vector to a subselection efficiently

Deleting element in priority queue other than top element in C++

C++ N-last added items container

Ordering output of C++ std::set

Overriding QAbstractItemModel::index and accessing std::map

Categories

Resources