I wonder if anybody could help me out.
I look for a data structure (such as list, queue, stack, array, vector, binary tree etc.) supporting these four operations:
isEmpty (true/false)
insert single element
pop (i.e. get&remove) single element
split into two structures e.g. take a approximately half (let's say +/- 20%) of elements and move them to another structure
Note that I don't care about order of elements at all.
Insert/pop example:
A.insert(1), A.insert(2), A.insert(3), A.insert(4), A.insert(5) // contains 1,2,3,4,5 in any order
A.pop() // 3
A.pop() // 2
A.pop() // 5
A.pop() // 1
A.pop() // 4
and the split example:
A.insert(1), A.insert(2), A.insert(3), A.insert(4), A.insert(5)
A.split(B)
// A = {1,4,3}, B={2,5} in any order
I need the structure to be be fast as possible - preferably all four operations in O(1). I doubt it have been already implemented in std so I will implement it by myself (in C++11, so std::move can be used).
Note that insert, pop and isEmpty are called about ten times more frequently than split.
I tried some coding with list and vector but with no success:
#include <vector>
#include <iostream>
// g++ -Wall -g -std=c++11
/*
output:
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
5 6 7 8 9
*/
int main ()
{
std::vector<int> v1;
for (int i = 0; i < 10; ++i) v1.push_back(i);
for (auto i : v1) std::cout << i << " ";
std::cout << std::endl;
auto halfway = v1.begin() + v1.size() / 2;
auto endItr = v1.end();
std::vector<int> v2;
v2.insert(v2.end(),
std::make_move_iterator(halfway),
std::make_move_iterator(endItr));
// sigsegv
/*
auto halfway2 = v1.begin() + v1.size() / 2;
auto endItr2 = v1.end();
v2.erase(halfway2, endItr2);
*/
for (auto i : v1) std::cout << i << " ";
std::cout << std::endl;
for (auto i : v2) std::cout << i << " ";
std::cout << std::endl;
return 0;
}
Any sample code, ideas, links or whatever useful? Thanks
Related literature:
How to move the later half of a vector into another vector? (actually does not work due to deletetion problem)
http://www.cplusplus.com/reference/iterator/move_iterator/
Your problems with the deletion aare due to a bug in your code.
// sigsegv
auto halfway2 = v1.begin() + v1.size() / 2;
auto endItr2 = v1.end();
v2.erase(halfway2, endItr2);
You try to erase from v2 with iterators pointing into v1. That won't work and you probably wanted to callerase on v1.
That fixes your deletion problem when splitting the vector, and vector seems to be the best container for what you want.
Note that everything except split can be done in O(1) on a vector if you insert at the end only, but since order doesn't matter for you I don't see any problem with it, split would be O(n) in your implemention once you fixed it, but that should be pretty fast since the data is right next to eachother in the vector and that's very cache friendly.
I can’t think of a solution with all operations in O(1).
With a list you can have push and pop in O(1), and split in O(n) (due to the fact that you need to find the middle of the list).
With a balanced binary tree (not a search tree) you can have all operations in O(log n).
edit
There have been some suggestions that keeping the middle of the list would produce O(1). This is not the case as when you split the function you have to compute the middle of the left list and the middle of the right list resulting in O(n).
Some other suggestion is that a vector is preferred simply because it is cache-friendly. I totally agree with this.
For fun, I implemented a balanced binary tree container that performs all operations in O(log n). The insert and pop are obviously in O(log n). The actual split is in O(1), however we are left with the root node which we have to insert in one of the halves resulting in O(log n) for split also. No copying is involved however.
Here is my attempt at the said container (I haven’t thoroughly tested for correctness, and it can be further optimized (like transforming the recursion in a loop)).
#include <memory>
#include <iostream>
#include <utility>
#include <exception>
template <class T>
class BalancedBinaryTree {
private:
class Node;
std::unique_ptr<Node> root_;
public:
void insert(const T &data) {
if (!root_) {
root_ = std::unique_ptr<Node>(new Node(data));
return;
}
root_->insert(data);
}
std::size_t getSize() const {
if (!root_) {
return 0;
}
return 1 + root_->getLeftCount() + root_->getRightCount();
}
// Tree must not be empty!!
T pop() {
if (root_->isLeaf()) {
T temp = root_->getData();
root_ = nullptr;
return temp;
}
return root_->pop()->getData();
}
BalancedBinaryTree split() {
if (!root_) {
return BalancedBinaryTree();
}
BalancedBinaryTree left_half;
T root_data = root_->getData();
bool left_is_bigger = root_->getLeftCount() > root_->getRightCount();
left_half.root_ = std::move(root_->getLeftChild());
root_ = std::move(root_->getRightChild());
if (left_is_bigger) {
insert(root_data);
} else {
left_half.insert(root_data);
}
return std::move(left_half);
}
};
template <class T>
class BalancedBinaryTree<T>::Node {
private:
T data_;
std::unique_ptr<Node> left_child_, right_child_;
std::size_t left_count_ = 0;
std::size_t right_count_ = 0;
public:
Node() = default;
Node(const T &data, std::unique_ptr<Node> left_child = nullptr,
std::unique_ptr<Node> right_child = nullptr)
: data_(data), left_child_(std::move(left_child)),
right_child_(std::move(right_child)) {
}
bool isLeaf() const {
return left_count_ + right_count_ == 0;
}
const T& getData() const {
return data_;
}
T& getData() {
return data_;
}
std::size_t getLeftCount() const {
return left_count_;
}
std::size_t getRightCount() const {
return right_count_;
}
std::unique_ptr<Node> &getLeftChild() {
return left_child_;
}
const std::unique_ptr<Node> &getLeftChild() const {
return left_child_;
}
std::unique_ptr<Node> &getRightChild() {
return right_child_;
}
const std::unique_ptr<Node> &getRightChild() const {
return right_child_;
}
void insert(const T &data) {
if (left_count_ <= right_count_) {
++left_count_;
if (left_child_) {
left_child_->insert(data);
} else {
left_child_ = std::unique_ptr<Node>(new Node(data));
}
} else {
++right_count_;
if (right_child_) {
right_child_->insert(data);
} else {
right_child_ = std::unique_ptr<Node>(new Node(data));
}
}
}
std::unique_ptr<Node> pop() {
if (isLeaf()) {
throw std::logic_error("pop invalid path");
}
if (left_count_ > right_count_) {
--left_count_;
if (left_child_->isLeaf()) {
return std::move(left_child_);
}
return left_child_->pop();
}
--right_count_;
if (right_child_->left_count_ == 0 && right_child_->right_count_ == 0) {
return std::move(right_child_);
}
return right_child_->pop();
}
};
usage:
BalancedBinaryTree<int> t;
BalancedBinaryTree<int> t2;
t.insert(3);
t.insert(7);
t.insert(17);
t.insert(37);
t.insert(1);
t2 = t.split();
while (t.getSize() != 0) {
std::cout << t.pop() << " ";
}
std::cout << std::endl;
while (t2.getSize() != 0) {
std::cout << t2.pop() << " ";
}
std::cout << std::endl;
output:
1 17
3 37 7
If the number of elements/bytes stored at any one time in your container is large, the solution of Youda008 (using a list and keeping track of the middle) may not be as efficient as you hope.
Alternatively, you could have a list<vector<T>> or even list<array<T,Capacity>> and keep track of the middle of the list, i.e. split only between two sub-containers, but never split a sub-container. This should give you both O(1) on all operations and reasonable cache efficiency. Use array<T,Capacity> if a single value for Capacity serves your needs at all times (for Capacity=1, this reverts to an ordinary list).
Otherwise, use vector<T> and adapt the capacity for new vectors according to demand.
bolov's points out correctly that finding the middles of the lists emerging from splitting one list is not O(1). This implies that keeping track of the middle is not useful. However, using a list<sub_container> is still faster than list, because the split only costs O(n/Capacity) not O(n). The price you pay for this is that the split has a graininess of Capacity rather than 1. Thus, you must compromise between the accuracy and cost of a split.
Another option is to implement own container using a linked list and a pointer to that middle element, at which you want to split it. This pointer will be updated on every modifying operation. This way you can achieve O(1) complexicity on all operations.
Related
(Sorry about the title, it's not the best descriptive)
I am playing with graph theory, and generating all possible combinations of a given set of input numbers. Given the input set {2,3,4}, my possible combinations (of which there are 3!), are:
The following recursive solution works, but I don't like the fact that I have to "copy" the input vector in order to "remove" the element that represents the node I am following in order to prevent including it for output again. Elements I am going to output are stored in vecValues whereas the elements I can currently choose from are stored in vecInput:
void OutputCombos(vector<int>& vecInput, vector<int>& vecValues)
{
// When hit 0 input size, output.
if (vecInput.size() == 0)
{
for (int i : vecValues) cout << i << " ";
cout << endl;
}
size_t nSize = vecInput.size();
for (vector<int>::iterator iter = begin(vecInput); iter != end(vecInput); ++iter)
{
auto vecCopy = vecInput;
vecCopy.erase(find(begin(vecCopy), end(vecCopy), *iter));
vecValues.push_back(*iter);
OutputCombos(vecCopy, vecValues);
vecValues.pop_back();
}
}
void OutputCombos(vector<int>& vecInput)
{
vector<int> vecValues;
OutputCombos(vecInput, vecValues);
}
int main()
{
vector<int> vecInput{ 2,3,4 };
OutputCombos(vecInput);
return 0;
}
As expected from my state space tree, the output is
2 3 4
2 4 3
3 2 4
3 4 2
4 2 3
4 3 2
How can I get around this without having to make a copy of the vector for each recursive call please?
You could always just use std::next_permutation from <algorithm>
#include <algorithm>
#include <iostream>
#include <vector>
int main()
{
std::vector<int> input {2, 3, 4};
do {
for (auto i : input) std::cout << i << " ";
std::cout << std::endl;
} while(std::next_permutation(input.begin(), input.end()));
return 0;
}
This gives you the same output. You might want to check out a possible implementation of next_permutation, which involves swaps within the vector rather than copying the vector several times.
I think this might be closer to what you're looking for. A version without std::next_permutation that doesn't involve copying any vectors, and allows the input to remain const. However, it does this at the cost of checking the output in each iteration to make sure it doesn't add the same number twice.
#include<vector>
#include<iostream>
#include<algorithm>
template<typename T>
void OutputCombinations(
const std::vector<T>& input,
std::vector<typename std::vector<T>::const_iterator >& output)
{
for(auto it = input.begin(); it != input.end(); ++it)
{
if (std::find(output.begin(), output.end(), it) == output.end())
{
output.push_back(it);
if (output.size() == input.size())
{
for(auto node : output) std::cout << *node << " ";
std::cout << std::endl;
}
else OutputCombinations(input, output);
output.pop_back();
}
}
}
int main()
{
std::vector<int> nodes{ 2, 3, 4, 2 };
std::vector<std::vector<int>::const_iterator> result{};
OutputCombinations(nodes, result);
return 0;
}
After much studying I found inspiration in this article which gave me the ultimate solution. The idea is that we keep a vector of Boolean values which indicates whether or not a particular value has been used in the combination; that way we don't need to remove the element that we have already used hence there is no memory allocation overhead.
So, when building the branch {2,4,3}, if we get to {2,4}, vecTaken will be {true, false, true} and nNumBoolsSet will be 2. So when we loop, we will only "use" the element at index 1 of vecInput since that is the only element that has not been used as dictated by vecTaken.
void OutputCombos(vector<int>& vecInput, vector<int>& vecValues, vector<bool>& vecTaken, int& nNumBoolsSet)
{
size_t nSize = vecInput.size();
if (nNumBoolsSet == nSize)
{
for (int i : vecValues) cout << i << " ";
cout << endl;
return;
}
for (vector<int>::size_type i = 0; i < nSize; ++i)
{
if (vecTaken[i] == false)
{
vecValues.push_back(vecInput[i]);
vecTaken[i] = true;
++nNumBoolsSet;
OutputCombos(vecInput, vecValues, vecTaken, nNumBoolsSet);
vecTaken[i] = false;
vecValues.pop_back();
--nNumBoolsSet;
}
}
}
void OutputCombos(vector<int>& vecInput)
{
vector<int> vecValues;
vector<bool> vecTaken(vecInput.size(), false);
int nNumBoolsSet = 0;
OutputCombos(vecInput, vecValues, vecTaken, nNumBoolsSet);
}
int main()
{
vector<int> vecInput{ 2,3,4 };
OutputCombos(vecInput);
}
This is a2.hpp, and is the program that can be edited, as far as I know the code is correct, just too slow. I am honestly lost here, I know my for loops are probably whats slowing me down so much, maybe use an iterator?
// <algorithm>, <list>, <vector>
// YOU CAN CHANGE/EDIT ANY CODE IN THIS FILE AS LONG AS SEMANTICS IS UNCHANGED
#include <algorithm>
#include <list>
#include <vector>
class key_value_sequences {
private:
std::list<std::vector<int>> seq;
std::vector<std::vector<int>> keyref;
public:
// YOU SHOULD USE C++ CONTAINERS TO AVOID RAW POINTERS
// IF YOU DECIDE TO USE POINTERS, MAKE SURE THAT YOU MANAGE MEMORY PROPERLY
// IMPLEMENT ME: SHOULD RETURN SIZE OF A SEQUENCE FOR GIVEN KEY
// IF NO SEQUENCE EXISTS FOR A GIVEN KEY RETURN 0
int size(int key) const;
// IMPLEMENT ME: SHOULD RETURN POINTER TO A SEQUENCE FOR GIVEN KEY
// IF NO SEQUENCE EXISTS FOR A GIVEN KEY RETURN nullptr
const int* data(int key) const;
// IMPLEMENT ME: INSERT VALUE INTO A SEQUENCE IDENTIFIED BY GIVEN KEY
void insert(int key, int value);
}; // class key_value_sequences
int key_value_sequences::size(int key) const {
//checks if the key is invalid or the count vector is empty.
if(key<0 || keyref[key].empty()) return 0;
// sub tract 1 because the first element is the key to access the count
return keyref[key].size() -1;
}
const int* key_value_sequences::data(int key) const {
//checks if key index or ref vector is invalid
if(key<0 || keyref.size() < static_cast<unsigned int>(key+1)) {
return nullptr;
}
// ->at(1) accesses the count (skipping the key) with a pointer
return &keyref[key].at(1);
}
void key_value_sequences::insert(int key, int value) {
//checks if key is valid and if the count vector needs to be resized
if(key>=0 && keyref.size() < static_cast<unsigned int>(key+1)) {
keyref.resize(key+1);
std::vector<int> val;
seq.push_back(val);
seq.back().push_back(key);
seq.back().push_back(value);
keyref[key] = seq.back();
}
//the index is already valid
else if(key >=0) keyref[key].push_back(value);
}
#endif // A2_HPP
This is a2.cpp, this just tests the functionality of a2.hpp, this code cannot be changed
// DO NOT EDIT THIS FILE !!!
// YOUR CODE MUST BE CONTAINED IN a2.hpp ONLY
#include <iostream>
#include "a2.hpp"
int main(int argc, char* argv[]) {
key_value_sequences A;
{
key_value_sequences T;
// k will be our key
for (int k = 0; k < 10; ++k) { //the actual tests will have way more than 10 sequences.
// v is our value
// here we are creating 10 sequences:
// key = 0, sequence = (0)
// key = 1, sequence = (0 1)
// key = 2, sequence = (0 1 2)
// ...
// key = 9, sequence = (0 1 2 3 4 5 6 7 8 9)
for (int v = 0; v < k + 1; ++v) T.insert(k, v);
}
T = T;
key_value_sequences V = T;
A = V;
}
std::vector<int> ref;
if (A.size(-1) != 0) {
std::cout << "fail" << std::endl;
return -1;
}
for (int k = 0; k < 10; ++k) {
if (A.size(k) != k + 1) {
std::cout << "fail";
return -1;
} else {
ref.clear();
for (int v = 0; v < k + 1; ++v) ref.push_back(v);
if (!std::equal(ref.begin(), ref.end(), A.data(k))) {
std::cout << "fail 3 " << A.data(k) << " " << ref[k];
return -1;
}
}
}
std::cout << "pass" << std::endl;
return 0;
} // main
If anyone could help me improve my codes efficiency I would really appreciate it, thanks.
First, I'm not convinced your code is correct. In insert, if they key is valid you create a new vector and insert it into sequence. Sounds wrong, as that should only happen if you have a new key, but if your tests pass it might be fine.
Performance wise:
Avoid std::list. Linked lists have terrible performance on today's hardware because they break pipelineing, caching and pre-fetching. Always use std::vector instead. If the payload is really big and you are worried about copies use std::vector<std::unique_ptr<T>>
Try to avoid copying vectors. In your code you have keyref[key] = seq.back() which copies the vector, but should be fine since it's only one element.
Otherwise there's no obvious performance problems. Try to benchmark and profile your program and see where the slow parts are. Usually there's one or two places that you need to optimize and get great performance. If it's still too slow, ask another question where you post your results so that we can better understand the problem.
I will join Sorin in saying don't use std::list if avoidable.
So you use key as direct index, where does it say it is none-negative? where does it say its less than 100000000?
void key_value_sequences::insert(int key, int value) {
//checks if key is valid and if the count vector needs to be resized
if(key>=0 && keyref.size() < static_cast<unsigned int>(key+1)) {
keyref.resize(key+1); // could be large
std::vector<int> val; // don't need this temporary.
seq.push_back(val); // seq is useless?
seq.back().push_back(key);
seq.back().push_back(value);
keyref[key] = seq.back(); // we now have 100000000-1 empty indexes
}
//the index is already valid
else if(key >=0) keyref[key].push_back(value);
}
Can it be done faster? depending on your key range yes it can. You will need to implement a flat_map or hash_map.
C++11 concept code for a flat_map version.
// effectively a binary search
auto key_value_sequences::find_it(int key) { // type should be iterator
return std::lower_bound(keyref.begin(), keyref.end(), [key](const auto& check){
return check[0] < key; // key is 0-element
});
}
void key_value_sequences::insert(int key, int value) {
auto found = find_it(key);
// at the end or not found
if (found == keyref.end() || found->front() != key) {
found = keyref.emplace(found, key); // add entry
}
found->emplace_back(value); // update entry, whether new or old.
}
const int* key_value_sequences::data(int key) const {
//checks if key index or ref vector is invalid
auto found = find_it(key);
if (found == keyref.end())
return nullptr;
// ->at(1) accesses the count (skipping the key) with a pointer
return found->at(1);
}
(hope I got that right ...)
There are a lot of questions which suggest that one should always use a vector, but it seems to me that a list would be better for the scenario, where we need to store "the last n items"
For example, say we need to store the last 5 items seen:
Iteration 0:
3,24,51,62,37,
Then at each iteration, the item at index 0 is removed, and the new item is added at the end:
Iteration 1:
24,51,62,37,8
Iteration 2:
51,62,37,8,12
It seems that for this use case, for a vector the complexity will be O(n), since we would have to copy n items, but in a list, it should be O(1), since we are always just chopping off the head, and adding to the tail each iteration.
Is my understanding correct? Is this the actual behaviour of an std::list ?
Neither. Your collection has a fixed size and std::array is sufficient.
The data structure you implement is called a ring buffer. To implement it you create an array and keep track of the offset of the current first element.
When you add an element that would push an item out of the buffer - i.e. when you remove the first element - you increment the offset.
To fetch elements in the buffer you add the index and the offset and take the modulo of this and the length of the buffer.
std::deque is a far better option. Or if you had benchmarked std::deque and found its performance to be inadequate for your specific use, you could implement a circular buffer in a fixed size array, storing the index of the start of the buffer. When replacing an element in the buffer, you would overwrite the element at the start index, and then set the start index to its previous value plus one modulo the size of the buffer.
List traversal is very slow, as list elements can be scattered throughout memory, and vector shifting is actually surprisingly fast, as memory moves on a single block of memory are quite fast even if it is a large block.
The talk Taming The Performance Beast from the Meeting C++ 2015 conference might be of interest to you.
If you can use Boost, try boost::circular_buffer:
It's a kind of sequence similar to std::list or std::deque. It supports random access iterators, constant time insert and erase operations at the beginning or the end of the buffer and interoperability with std algorithms.
It provides fixed capacity storage: when the buffer is filled, new data is written starting at the beginning of the buffer and overwriting the old
// Create a circular buffer with a capacity for 5 integers.
boost::circular_buffer<int> cb(5);
// Insert elements into the buffer.
cb.push_back(3);
cb.push_back(24);
cb.push_back(51);
cb.push_back(62);
cb.push_back(37);
int a = cb[0]; // a == 3
int b = cb[1]; // b == 24
int c = cb[2]; // c == 51
// The buffer is full now, so pushing subsequent
// elements will overwrite the front-most elements.
cb.push_back(8); // overwrite 3 with 8
cb.push_back(12); // overwrite 24 with 12
// The buffer now contains 51, 62, 37, 8, 12.
// Elements can be popped from either the front or the back.
cb.pop_back(); // 12 is removed
cb.pop_front(); // 51 is removed
The circular_buffer stores its elements in a contiguous region of memory, which then enables fast constant-time insertion, removal and random access of elements.
PS ... or implement the circular buffer directly as suggested by Taemyr.
Overload Journal #50 - Aug 2002 has a nice introduction (by Pete Goodliffe) to writing robust STL-like circular buffer.
The problem is that O(n) only talks about the asymptotic behaviour as n tends to infinity. If n is small then the constant factors involved become significant. The result is that for "last 5 integer items" I would be stunned if vector didn't beat list. I would even expect std::vector to beat std::deque.
For "last 500 integer items" I would still expect std::vector to be faster than std::list - but std::deque would now probably win. For "last 5 million slow-to-copy items", std:vector would be slowest of all.
A ring buffer based on std::array or std::vector would probably be faster still though.
As (almost) always with performance issues:
encapsulate with a fixed interface
write the simplest code that can implement that interface
if profiling shows you have a problem, optimize (which will make the code more complicated).
In practise, just using a std::deque, or a pre-built ring-buffer if you have one, will be good enough. (But it's not worth going to the trouble of writing a ring buffer unless profiling says you need to.)
Here is a minimal circular buffer. I'm primarily posting that here to get a metric ton of comments and ideas of improvement.
Minimal Implementation
#include <iterator>
template<typename Container>
class CircularBuffer
{
public:
using iterator = typename Container::iterator;
using value_type = typename Container::value_type;
private:
Container _container;
iterator _pos;
public:
CircularBuffer() : _pos(std::begin(_container)) {}
public:
value_type& operator*() const { return *_pos; }
CircularBuffer& operator++() { ++_pos ; if (_pos == std::end(_container)) _pos = std::begin(_container); return *this; }
CircularBuffer& operator--() { if (_pos == std::begin(_container)) _pos = std::end(_container); --_pos; return *this; }
};
Usage
#include <iostream>
#include <array>
int main()
{
CircularBuffer<std::array<int,5>> buf;
*buf = 1; ++buf;
*buf = 2; ++buf;
*buf = 3; ++buf;
*buf = 4; ++buf;
*buf = 5; ++buf;
std::cout << *buf << " "; ++buf;
std::cout << *buf << " "; ++buf;
std::cout << *buf << " "; ++buf;
std::cout << *buf << " "; ++buf;
std::cout << *buf << " "; ++buf;
std::cout << *buf << " "; ++buf;
std::cout << *buf << " "; ++buf;
std::cout << *buf << " "; --buf;
std::cout << *buf << " "; --buf;
std::cout << *buf << " "; --buf;
std::cout << *buf << " "; --buf;
std::cout << *buf << " "; --buf;
std::cout << *buf << " "; --buf;
std::cout << std::endl;
}
Compile with
g++ -std=c++17 -O2 -Wall -Wextra -pedantic -Werror
Demo
On Coliru: try it online
If you need to store last N-elements then logically you are doing some kind of queue or a circular buffer, std::stack and std::deque are implementations of LIFO and FIFO queues.
You can use boost::circular_buffer or implement simple circular buffer manually:
template<int Capcity>
class cbuffer
{
public:
cbuffer() : sz(0), p(0){}
void push_back(int n)
{
buf[p++] = n;
if (sz < Capcity)
sz++;
if (p >= Capcity)
p = 0;
}
int size() const
{
return sz;
}
int operator[](int n) const
{
assert(n < sz);
n = p - sz + n;
if (n < 0)
n += Capcity;
return buf[n];
}
int buf[Capcity];
int sz, p;
};
Sample use for circular buffer of 5 int elements:
int main()
{
cbuffer<5> buf;
// insert random 100 numbers
for (int i = 0; i < 100; ++i)
buf.push_back(rand());
// output to cout contents of the circular buffer
for (int i = 0; i < buf.size(); ++i)
cout << buf[i] << ' ';
}
As a note, keep in mind that when you have only 5 elements the best solution is the one that's fast to implement and works correctly.
Yes. Time complexity of the std::vector for removing elements from the end is linear. std::deque might be a good choice for what you are doing as it offers constant time insertion and removal at the beginning as well as at the end of the list and also better performance than std::list
Source:
http://www.sgi.com/tech/stl/Vector.html
http://www.sgi.com/tech/stl/Deque.html
Here are the beginnings of a ring buffer based dequeue template class that I wrote a while ago, mostly to experiment with using std::allocator (so it does not require T to be default constructible). Note it currently doesn't have iterators, or insert/remove, copy/move constructors, etc.
#ifndef RING_DEQUEUE_H
#define RING_DEQUEUE_H
#include <memory>
#include <type_traits>
#include <limits>
template <typename T, size_t N>
class ring_dequeue {
private:
static_assert(N <= std::numeric_limits<size_t>::max() / 2 &&
N <= std::numeric_limits<size_t>::max() / sizeof(T),
"size of ring_dequeue is too large");
using alloc_traits = std::allocator_traits<std::allocator<T>>;
public:
using value_type = T;
using reference = T&;
using const_reference = const T&;
using difference_type = ssize_t;
using size_type = size_t;
ring_dequeue() = default;
// Disable copy and move constructors for now - if iterators are
// implemented later, then those could be delegated to the InputIterator
// constructor below (using the std::move_iterator adaptor for the move
// constructor case).
ring_dequeue(const ring_dequeue&) = delete;
ring_dequeue(ring_dequeue&&) = delete;
ring_dequeue& operator=(const ring_dequeue&) = delete;
ring_dequeue& operator=(ring_dequeue&&) = delete;
template <typename InputIterator>
ring_dequeue(InputIterator begin, InputIterator end) {
while (m_tailIndex < N && begin != end) {
alloc_traits::construct(m_alloc, reinterpret_cast<T*>(m_buf) + m_tailIndex,
*begin);
++m_tailIndex;
++begin;
}
if (begin != end)
throw std::logic_error("Input range too long");
}
ring_dequeue(std::initializer_list<T> il) :
ring_dequeue(il.begin(), il.end()) { }
~ring_dequeue() noexcept(std::is_nothrow_destructible<T>::value) {
while (m_headIndex < m_tailIndex) {
alloc_traits::destroy(m_alloc, elemPtr(m_headIndex));
m_headIndex++;
}
}
size_t size() const {
return m_tailIndex - m_headIndex;
}
size_t max_size() const {
return N;
}
bool empty() const {
return m_headIndex == m_tailIndex;
}
bool full() const {
return m_headIndex + N == m_tailIndex;
}
template <typename... Args>
void emplace_front(Args&&... args) {
if (full())
throw std::logic_error("ring_dequeue full");
bool wasAtZero = (m_headIndex == 0);
auto newHeadIndex = wasAtZero ? (N - 1) : (m_headIndex - 1);
alloc_traits::construct(m_alloc, elemPtr(newHeadIndex),
std::forward<Args>(args)...);
m_headIndex = newHeadIndex;
if (wasAtZero)
m_tailIndex += N;
}
void push_front(const T& x) {
emplace_front(x);
}
void push_front(T&& x) {
emplace_front(std::move(x));
}
template <typename... Args>
void emplace_back(Args&&... args) {
if (full())
throw std::logic_error("ring_dequeue full");
alloc_traits::construct(m_alloc, elemPtr(m_tailIndex),
std::forward<Args>(args)...);
++m_tailIndex;
}
void push_back(const T& x) {
emplace_back(x);
}
void push_back(T&& x) {
emplace_back(std::move(x));
}
T& front() {
if (empty())
throw std::logic_error("ring_dequeue empty");
return *elemPtr(m_headIndex);
}
const T& front() const {
if (empty())
throw std::logic_error("ring_dequeue empty");
return *elemPtr(m_headIndex);
}
void remove_front() {
if (empty())
throw std::logic_error("ring_dequeue empty");
alloc_traits::destroy(m_alloc, elemPtr(m_headIndex));
++m_headIndex;
if (m_headIndex == N) {
m_headIndex = 0;
m_tailIndex -= N;
}
}
T pop_front() {
T result = std::move(front());
remove_front();
return result;
}
T& back() {
if (empty())
throw std::logic_error("ring_dequeue empty");
return *elemPtr(m_tailIndex - 1);
}
const T& back() const {
if (empty())
throw std::logic_error("ring_dequeue empty");
return *elemPtr(m_tailIndex - 1);
}
void remove_back() {
if (empty())
throw std::logic_error("ring_dequeue empty");
alloc_traits::destroy(m_alloc, elemPtr(m_tailIndex - 1));
--m_tailIndex;
}
T pop_back() {
T result = std::move(back());
remove_back();
return result;
}
private:
alignas(T) char m_buf[N * sizeof(T)];
size_t m_headIndex = 0;
size_t m_tailIndex = 0;
std::allocator<T> m_alloc;
const T* elemPtr(size_t index) const {
if (index >= N)
index -= N;
return reinterpret_cast<const T*>(m_buf) + index;
}
T* elemPtr(size_t index) {
if (index >= N)
index -= N;
return reinterpret_cast<T*>(m_buf) + index;
}
};
#endif
Briefly say the std::vector is better for a non-change size of memory.In your case,if you move all data forward or append new data in a vector,that must be a waste.As #David said the std::deque is a good option,since you would pop_head and push_back eg. two way list.
from the cplus cplus reference about the list
Compared to other base standard sequence containers (array, vector and
deque), lists perform generally better in inserting, extracting and
moving elements in any position within the container for which an
iterator has already been obtained, and therefore also in algorithms
that make intensive use of these, like sorting algorithms.
The main drawback of lists and forward_lists compared to these other
sequence containers is that they lack direct access to the elements by
their position; For example, to access the sixth element in a list,
one has to iterate from a known position (like the beginning or the
end) to that position, which takes linear time in the distance between
these. They also consume some extra memory to keep the linking
information associated to each element (which may be an important
factor for large lists of small-sized elements).
about deque
For operations that involve frequent insertion or removals of elements
at positions other than the beginning or the end, deques perform worse
and have less consistent iterators and references than lists and
forward lists.
vetor
Therefore, compared to arrays, vectors consume more memory in exchange
for the ability to manage storage and grow dynamically in an efficient
way.
Compared to the other dynamic sequence containers (deques, lists and
forward_lists), vectors are very efficient accessing its elements
(just like arrays) and relatively efficient adding or removing
elements from its end. For operations that involve inserting or
removing elements at positions other than the end, they perform worse
than the others, and have less consistent iterators and references
than lists and forward_lists.
I think even use std::deque it also have overhead of copy items in certain condition because std::deque is a map of arrays essentially, so std::list is a good idea to eliminate the copy overhead.
To increase the performance of traverse for std::list, you can implement a memory pool so that the std::list will allocate memory from a trunk and it's spatial locality for caching.
I've got the following problem. I have a game which runs on average 60 frames per second. Each frame I need to store values in a container and there must be no duplicates.
It probably has to store less than 100 items per frame, but the number of insert-calls will be alot more (and many rejected due to it has to be unique). Only at the end of the frame do I need to traverse the container. So about 60 iterations of the container per frame, but alot more insertions.
Keep in mind the items to store are simple integer.
There are a bunch of containers I can use for this but I cannot make up my mind what to pick. Performance is the key issue for this.
Some pros/cons that I've gathered:
vector
(PRO): Contigous memory, a huge factor.
(PRO): Memory can be reserved first, very few allocations/deallocations afterwards
(CON): No alternative than to traverse the container (std::find) each insert() to find unique keys? The comparison is simple though (integers) and the whole container can probably fit the cache
set
(PRO): Simple, clearly meant for this
(CON): Not constant insert-time
(CON): Alot of allocations/deallocations per frame
(CON): Not contigous memory. Traversing a set of hundreds of objects means jumping around alot in memory.
unordered_set
(PRO): Simple, clearly meant for this
(PRO): Average case constant time insert
(CON): Seeing as I store integers, hash operation is probably alot more expensive than anything else
(CON): Alot of allocations/deallocations per frame
(CON): Not contigous memory. Traversing a set of hundreds of objects means jumping around alot in memory.
I'm leaning on going the vector-route because of memory access patterns, even though set is clearly meant for this issue. The big issue that is unclear to me is whether traversing the vector for each insert is more costly than the allocations/deallocations (especially considering how often this must be done) and the memory lookups of set.
I know ultimately it all comes down to profiling each case, but if nothing else than as a headstart or just theoretically, what would probably be best in this scenario? Are there any pros/cons I might've missed aswell?
EDIT: As I didnt mention, the container is cleared() at the end of each frame
I did timing with a few different methods that I thought were likely candidates. Using std::unordered_set was the winner.
Here are my results:
Using UnorderedSet: 0.078s
Using UnsortedVector: 0.193s
Using OrderedSet: 0.278s
Using SortedVector: 0.282s
Timing is based on the median of five runs for each case.
compiler: gcc version 4.9.1
flags: -std=c++11 -O2
OS: ubuntu 4.9.1
CPU: Intel(R) Core(TM) i5-4690K CPU # 3.50GHz
Code:
#include <algorithm>
#include <chrono>
#include <cstdlib>
#include <iostream>
#include <random>
#include <set>
#include <unordered_set>
#include <vector>
using std::cerr;
static const size_t n_distinct = 100;
template <typename Engine>
static std::vector<int> randomInts(Engine &engine,size_t n)
{
auto distribution = std::uniform_int_distribution<int>(0,n_distinct);
auto generator = [&]{return distribution(engine);};
auto vec = std::vector<int>();
std::generate_n(std::back_inserter(vec),n,generator);
return vec;
}
struct UnsortedVectorSmallSet {
std::vector<int> values;
static const char *name() { return "UnsortedVector"; }
UnsortedVectorSmallSet() { values.reserve(n_distinct); }
void insert(int new_value)
{
auto iter = std::find(values.begin(),values.end(),new_value);
if (iter!=values.end()) return;
values.push_back(new_value);
}
};
struct SortedVectorSmallSet {
std::vector<int> values;
static const char *name() { return "SortedVector"; }
SortedVectorSmallSet() { values.reserve(n_distinct); }
void insert(int new_value)
{
auto iter = std::lower_bound(values.begin(),values.end(),new_value);
if (iter==values.end()) {
values.push_back(new_value);
return;
}
if (*iter==new_value) return;
values.insert(iter,new_value);
}
};
struct OrderedSetSmallSet {
std::set<int> values;
static const char *name() { return "OrderedSet"; }
void insert(int new_value) { values.insert(new_value); }
};
struct UnorderedSetSmallSet {
std::unordered_set<int> values;
static const char *name() { return "UnorderedSet"; }
void insert(int new_value) { values.insert(new_value); }
};
int main()
{
//using SmallSet = UnsortedVectorSmallSet;
//using SmallSet = SortedVectorSmallSet;
//using SmallSet = OrderedSetSmallSet;
using SmallSet = UnorderedSetSmallSet;
auto engine = std::default_random_engine();
std::vector<int> values_to_insert = randomInts(engine,10000000);
SmallSet small_set;
namespace chrono = std::chrono;
using chrono::system_clock;
auto start_time = system_clock::now();
for (auto value : values_to_insert) {
small_set.insert(value);
}
auto end_time = system_clock::now();
auto& result = small_set.values;
auto sum = std::accumulate(result.begin(),result.end(),0u);
auto elapsed_seconds = chrono::duration<float>(end_time-start_time).count();
cerr << "Using " << SmallSet::name() << ":\n";
cerr << " sum=" << sum << "\n";
cerr << " elapsed: " << elapsed_seconds << "s\n";
}
I'm going to put my neck on the block here and suggest that the vector route is probably most efficient when the size is 100 and the objects being stored are integral values. The simple reason for this is that set and unordered_set allocate memory for each insert whereas the vector needn't more than once.
You can increase search performance dramatically by keeping the vector ordered, since then all searches can be binary searches and therefore complete in log2N time.
The downside is that the inserts will take a tiny fraction longer due to the memory moves, but it sounds as if there will be many more searches than inserts, and moving (average) 50 contiguous memory words is an almost instantaneous operation.
Final word:
Write the correct logic now. Worry about performance when the users are complaining.
EDIT:
Because I couldn't help myself, here's a reasonably complete implementation:
template<typename T>
struct vector_set
{
using vec_type = std::vector<T>;
using const_iterator = typename vec_type::const_iterator;
using iterator = typename vec_type::iterator;
vector_set(size_t max_size)
: _max_size { max_size }
{
_v.reserve(_max_size);
}
/// #returns: pair of iterator, bool
/// If the value has been inserted, the bool will be true
/// the iterator will point to the value, or end if it wasn't
/// inserted due to space exhaustion
auto insert(const T& elem)
-> std::pair<iterator, bool>
{
if (_v.size() < _max_size) {
auto it = std::lower_bound(_v.begin(), _v.end(), elem);
if (_v.end() == it || *it != elem) {
return make_pair(_v.insert(it, elem), true);
}
return make_pair(it, false);
}
else {
return make_pair(_v.end(), false);
}
}
auto find(const T& elem) const
-> const_iterator
{
auto vend = _v.end();
auto it = std::lower_bound(_v.begin(), vend, elem);
if (it != vend && *it != elem)
it = vend;
return it;
}
bool contains(const T& elem) const {
return find(elem) != _v.end();
}
const_iterator begin() const {
return _v.begin();
}
const_iterator end() const {
return _v.end();
}
private:
vec_type _v;
size_t _max_size;
};
using namespace std;
BOOST_AUTO_TEST_CASE(play_unique_vector)
{
vector_set<int> v(100);
for (size_t i = 0 ; i < 1000000 ; ++i) {
v.insert(int(random() % 200));
}
cout << "unique integers:" << endl;
copy(begin(v), end(v), ostream_iterator<int>(cout, ","));
cout << endl;
cout << "contains 100: " << v.contains(100) << endl;
cout << "contains 101: " << v.contains(101) << endl;
cout << "contains 102: " << v.contains(102) << endl;
cout << "contains 103: " << v.contains(103) << endl;
}
As you said you have many insertions and only one traversal, I’d suggest to use a vector and push the elements in regardless of whether they are unique in the vector. This is done in O(1).
Just when you need to go through the vector, then sort it and remove the duplicate elements. I believe this can be done in O(n) as they are bounded integers.
EDIT: Sorting in linear time through counting sort presented in this video. If not feasible, then you are back to O(n lg(n)).
You will have very little cache miss because of the contiguity of the vector in memory, and very few allocations (especially if you reserve enough memory in the vector).
I am trying to merge two arrays/lists where each element of the array has to be compared. If there is an identical element in both of them I increase their total occurrence by one. The arrays are both 2D, where each element has a counter for its occurrence. I know both of these arrays can be compared with a double for loop in O(n^2), however I am limited by a bound of O(nlogn). The final array will have all of the elements from both lists with their increased counters if there are more than one occurrence
Array A[][] = [[8,1],[5,1]]
Array B[][] = [[2,1],[8,1]]
After the merge is complete I should get an array like so
Array C[][] = [[2,1],[8,2],[8,2],[5,1]]
The arrangement of the elements does not have to be necessary.
From readings, Mergesort takes O(nlogn) to merge two lists however I am currently at a roadblock with my bound problem. Any pseudo code visual would be appreciated.
I quite like Stepanov's Efficient Programming although they are rather slow. In sessions 6 and 7 (if I recall correctly) he discusses the algorithms add_to_counter() and reduce_counter(). Both algorithms are entirely trivial, of course, but can be used to implement a non-recursive merge-sort without too much effort. The only possibly non-obvious insight is that the combining operation can reduce the two elements into a sequence rather than just one element. To do the operations in-place you'd actually store iterators (i.e., pointers in case of arrays) using a suitable class to represent a partial view of an array.
I haven't watched the sessions beyond session 7 (and actually not even the complete session 7, yet) but I would fully expect that he actually presents how to use the counter produced in session 7 to implement, e.g., merge-sort. Of course, the run-time complexity of merge-sort is O(n ln n) and, when using the counter approach it will use O(ln n) auxiliary space.
A simple algorithm that requires twice as much memory would be to order both inputs (O(n log n)) and then sequentially pick the elements from the head of both lists and do the merge (O(n)). The overall cost would be O(n log n) with O(n) extra memory (additional size of the smallest of both inputs)
Here's my algorithm based on bucket counting
time complexity: O(n)
memory complexity: O(max), where max is the maximum element in the arrays
Output:
[8,2][5,1][2,1][8,2]
Code:
#include <iostream>
#include <vector>
#include <iterator>
int &refreshCount(std::vector<int> &counters, int in) {
if((counters.size() - 1) < in) {
counters.resize(in + 1);
}
return ++counters[in];
}
void copyWithCounts(std::vector<std::pair<int, int> >::iterator it,
std::vector<std::pair<int, int> >::iterator end,
std::vector<int> &counters,
std::vector<std::pair<int, int&> > &result
) {
while(it != end) {
int &count = refreshCount(counters, (*it).first);
std::pair<int, int&> element((*it).first, count);
result.push_back(element);
++it;
}
}
void countingMerge(std::vector<std::pair<int, int> > &array1,
std::vector<std::pair<int, int> > &array2,
std::vector<std::pair<int, int&> > &result) {
auto array1It = array1.begin();
auto array1End = array1.end();
auto array2It = array2.begin();
auto array2End = array2.end();
std::vector<int> counters = {0};
copyWithCounts(array1It, array1End, counters, result);
copyWithCounts(array2It, array2End, counters, result);
}
int main()
{
std::vector<std::pair<int, int> > array1 = {{8, 1}, {5, 1}};
std::vector<std::pair<int, int> > array2 = {{2, 1}, {8, 1}};
std::vector<std::pair<int, int&> > result;
countingMerge(array1, array2, result);
for(auto it = result.begin(); it != result.end(); ++it) {
std::cout << "[" << (*it).first << "," << (*it).second << "] ";
}
return 0;
}
Short explanation:
because you mentioned, that final arrangement is not necessary, I did simple merge (without sort, who asked sort?) with counting, where result contains reference to counters, so no need to walk through the array to update the counters.
You could write an algorithm to merge them by walking both sequences sequentially in order, inserting where appropriate.
I've chosen a (seemingly more apt) datastructure here: std::map<Value, Occurence>:
#include <map>
using namespace std;
using Value = int;
using Occurence = unsigned;
using Histo = map<Value, Occurence>;
If you insist on contiguous storage, boost::flat_map<> should be your friend here (and a drop-in replacement).
The algorithm (tested with your inputs, read comments for explanation):
void MergeInto(Histo& target, Histo const& other)
{
auto left_it = begin(target), left_end = end(target);
auto right_it = begin(other), right_end = end(other);
auto const& cmp = target.value_comp();
while (right_it != right_end)
{
if ((left_it == left_end) || cmp(*right_it, *left_it))
{
// insert at left_it
target.insert(left_it, *right_it);
++right_it; // and carry on
} else if (cmp(*left_it, *right_it))
{
++left_it; // keep left_it first, so increment it
} else
{
// keys match!
left_it->second += right_it->second;
++left_it;
++right_it;
}
}
}
It's really quite straight-forward!
A test program: See it Live On Coliru
#include <iostream>
// for debug output
static inline std::ostream& operator<<(std::ostream& os, Histo::value_type const& v) { return os << "{" << v.first << "," << v.second << "}"; }
static inline std::ostream& operator<<(std::ostream& os, Histo const& v) { for (auto& el : v) os << el << " "; return os; }
//
int main(int argc, char *argv[])
{
Histo A { { 8, 1 }, { 5, 1 } };
Histo B { { 2, 1 }, { 8, 1 } };
std::cout << "A: " << A << "\n";
std::cout << "B: " << B << "\n";
MergeInto(A, B);
std::cout << "merged: " << A << "\n";
}
Printing:
A: {5,1} {8,1}
B: {2,1} {8,1}
merged: {2,1} {5,1} {8,2}
You could shuffle the interface a tiny bit in case you really wanted to merge into a new object (C):
// convenience
Histo Merge(Histo const& left, Histo const& right)
{
auto copy(left);
MergeInto(copy, right);
return copy;
}
Now you can just write
Histo A { { 8, 1 }, { 5, 1 } };
Histo B { { 2, 1 }, { 8, 1 } };
auto C = Merge(A, B);
See that Live on Coliru, too