What container to store unique values? - c++

I've got the following problem. I have a game which runs on average 60 frames per second. Each frame I need to store values in a container and there must be no duplicates.
It probably has to store less than 100 items per frame, but the number of insert-calls will be alot more (and many rejected due to it has to be unique). Only at the end of the frame do I need to traverse the container. So about 60 iterations of the container per frame, but alot more insertions.
Keep in mind the items to store are simple integer.
There are a bunch of containers I can use for this but I cannot make up my mind what to pick. Performance is the key issue for this.
Some pros/cons that I've gathered:
vector
(PRO): Contigous memory, a huge factor.
(PRO): Memory can be reserved first, very few allocations/deallocations afterwards
(CON): No alternative than to traverse the container (std::find) each insert() to find unique keys? The comparison is simple though (integers) and the whole container can probably fit the cache
set
(PRO): Simple, clearly meant for this
(CON): Not constant insert-time
(CON): Alot of allocations/deallocations per frame
(CON): Not contigous memory. Traversing a set of hundreds of objects means jumping around alot in memory.
unordered_set
(PRO): Simple, clearly meant for this
(PRO): Average case constant time insert
(CON): Seeing as I store integers, hash operation is probably alot more expensive than anything else
(CON): Alot of allocations/deallocations per frame
(CON): Not contigous memory. Traversing a set of hundreds of objects means jumping around alot in memory.
I'm leaning on going the vector-route because of memory access patterns, even though set is clearly meant for this issue. The big issue that is unclear to me is whether traversing the vector for each insert is more costly than the allocations/deallocations (especially considering how often this must be done) and the memory lookups of set.
I know ultimately it all comes down to profiling each case, but if nothing else than as a headstart or just theoretically, what would probably be best in this scenario? Are there any pros/cons I might've missed aswell?
EDIT: As I didnt mention, the container is cleared() at the end of each frame

I did timing with a few different methods that I thought were likely candidates. Using std::unordered_set was the winner.
Here are my results:
Using UnorderedSet: 0.078s
Using UnsortedVector: 0.193s
Using OrderedSet: 0.278s
Using SortedVector: 0.282s
Timing is based on the median of five runs for each case.
compiler: gcc version 4.9.1
flags: -std=c++11 -O2
OS: ubuntu 4.9.1
CPU: Intel(R) Core(TM) i5-4690K CPU # 3.50GHz
Code:
#include <algorithm>
#include <chrono>
#include <cstdlib>
#include <iostream>
#include <random>
#include <set>
#include <unordered_set>
#include <vector>
using std::cerr;
static const size_t n_distinct = 100;
template <typename Engine>
static std::vector<int> randomInts(Engine &engine,size_t n)
{
auto distribution = std::uniform_int_distribution<int>(0,n_distinct);
auto generator = [&]{return distribution(engine);};
auto vec = std::vector<int>();
std::generate_n(std::back_inserter(vec),n,generator);
return vec;
}
struct UnsortedVectorSmallSet {
std::vector<int> values;
static const char *name() { return "UnsortedVector"; }
UnsortedVectorSmallSet() { values.reserve(n_distinct); }
void insert(int new_value)
{
auto iter = std::find(values.begin(),values.end(),new_value);
if (iter!=values.end()) return;
values.push_back(new_value);
}
};
struct SortedVectorSmallSet {
std::vector<int> values;
static const char *name() { return "SortedVector"; }
SortedVectorSmallSet() { values.reserve(n_distinct); }
void insert(int new_value)
{
auto iter = std::lower_bound(values.begin(),values.end(),new_value);
if (iter==values.end()) {
values.push_back(new_value);
return;
}
if (*iter==new_value) return;
values.insert(iter,new_value);
}
};
struct OrderedSetSmallSet {
std::set<int> values;
static const char *name() { return "OrderedSet"; }
void insert(int new_value) { values.insert(new_value); }
};
struct UnorderedSetSmallSet {
std::unordered_set<int> values;
static const char *name() { return "UnorderedSet"; }
void insert(int new_value) { values.insert(new_value); }
};
int main()
{
//using SmallSet = UnsortedVectorSmallSet;
//using SmallSet = SortedVectorSmallSet;
//using SmallSet = OrderedSetSmallSet;
using SmallSet = UnorderedSetSmallSet;
auto engine = std::default_random_engine();
std::vector<int> values_to_insert = randomInts(engine,10000000);
SmallSet small_set;
namespace chrono = std::chrono;
using chrono::system_clock;
auto start_time = system_clock::now();
for (auto value : values_to_insert) {
small_set.insert(value);
}
auto end_time = system_clock::now();
auto& result = small_set.values;
auto sum = std::accumulate(result.begin(),result.end(),0u);
auto elapsed_seconds = chrono::duration<float>(end_time-start_time).count();
cerr << "Using " << SmallSet::name() << ":\n";
cerr << " sum=" << sum << "\n";
cerr << " elapsed: " << elapsed_seconds << "s\n";
}

I'm going to put my neck on the block here and suggest that the vector route is probably most efficient when the size is 100 and the objects being stored are integral values. The simple reason for this is that set and unordered_set allocate memory for each insert whereas the vector needn't more than once.
You can increase search performance dramatically by keeping the vector ordered, since then all searches can be binary searches and therefore complete in log2N time.
The downside is that the inserts will take a tiny fraction longer due to the memory moves, but it sounds as if there will be many more searches than inserts, and moving (average) 50 contiguous memory words is an almost instantaneous operation.
Final word:
Write the correct logic now. Worry about performance when the users are complaining.
EDIT:
Because I couldn't help myself, here's a reasonably complete implementation:
template<typename T>
struct vector_set
{
using vec_type = std::vector<T>;
using const_iterator = typename vec_type::const_iterator;
using iterator = typename vec_type::iterator;
vector_set(size_t max_size)
: _max_size { max_size }
{
_v.reserve(_max_size);
}
/// #returns: pair of iterator, bool
/// If the value has been inserted, the bool will be true
/// the iterator will point to the value, or end if it wasn't
/// inserted due to space exhaustion
auto insert(const T& elem)
-> std::pair<iterator, bool>
{
if (_v.size() < _max_size) {
auto it = std::lower_bound(_v.begin(), _v.end(), elem);
if (_v.end() == it || *it != elem) {
return make_pair(_v.insert(it, elem), true);
}
return make_pair(it, false);
}
else {
return make_pair(_v.end(), false);
}
}
auto find(const T& elem) const
-> const_iterator
{
auto vend = _v.end();
auto it = std::lower_bound(_v.begin(), vend, elem);
if (it != vend && *it != elem)
it = vend;
return it;
}
bool contains(const T& elem) const {
return find(elem) != _v.end();
}
const_iterator begin() const {
return _v.begin();
}
const_iterator end() const {
return _v.end();
}
private:
vec_type _v;
size_t _max_size;
};
using namespace std;
BOOST_AUTO_TEST_CASE(play_unique_vector)
{
vector_set<int> v(100);
for (size_t i = 0 ; i < 1000000 ; ++i) {
v.insert(int(random() % 200));
}
cout << "unique integers:" << endl;
copy(begin(v), end(v), ostream_iterator<int>(cout, ","));
cout << endl;
cout << "contains 100: " << v.contains(100) << endl;
cout << "contains 101: " << v.contains(101) << endl;
cout << "contains 102: " << v.contains(102) << endl;
cout << "contains 103: " << v.contains(103) << endl;
}

As you said you have many insertions and only one traversal, I’d suggest to use a vector and push the elements in regardless of whether they are unique in the vector. This is done in O(1).
Just when you need to go through the vector, then sort it and remove the duplicate elements. I believe this can be done in O(n) as they are bounded integers.
EDIT: Sorting in linear time through counting sort presented in this video. If not feasible, then you are back to O(n lg(n)).
You will have very little cache miss because of the contiguity of the vector in memory, and very few allocations (especially if you reserve enough memory in the vector).

Related

Deciding two integer list contains common elements or not (c++)

I have a running-time issue about my c++ program. The program doing millions of times comparing two integer list contains common elements or not. I don't need to learn which elements is common. I wrote the method below but it doesn't look efficient. I need to speed up program. So, what is the best way of doing this process or c++ have any built-in method which is doing this compare efficiently?
bool compareHSAndNewSet(list<int> hs , list<int> newset){
bool isCommon = false;
for(int x : hs){
for(int y : newset){
if(x == y){isCommon = true; break;}
}
if(isCommon == true) {break;}
}
return isCommon;
}
Hint: I don't now maybe this means something. The first input of the function (in the code hs) is ordered.
I was curious about the various strategies, so I made the simple benchmark below.
However, I wouldn't try to sort the second container; comparing all the data inside a container and moving them around seems to be overkill just to find one element in the intersection.
The program gives these results on my computer (Intel(R) Core(TM) i7-10875H CPU # 2.30GHz):
vectors: 1.41164
vectors (dichotomic): 0.0187354
lists: 12.0402
lists (dichotomic): 13.4844
If we ignore that the first container is sorted and iterate its elements in order, we can see that a simpler container (a vector here) with adjacent storage of the elements if much better than multiple elements spread in memory (a list here): 1.41164 s over 12.0402 (8.5 speedup).
But if we consider that the first container is sorted (as told in the question), a dichotomic approach can improve even more the situation.
The best case (dichotomic approach on vectors) is far better than the original case (in order approach on lists): 0.0187354 s over 12.0402 s (642 speedup).
Of course, all of this depends on many other factors (sizes of datasets, distributions of the values...); this is just a micro benchmark, and a specific application could behave differently.
Note that in the question, the parameters were passed by value; this will probably cause some unneeded copies (except if a move operation is used at the call site, but I would find that uncommon for such a function). I switched to pass-by-reference-on-const instead.
Note also that a dichotomic approach on a list is a pessimisation (no random access for the iterators, so it's still linear but more complicated than the simplest linear approach).
edit: my original code was wrong, thanks to #bitmask I changed it; it does not change the general idea.
/**
g++ -std=c++17 -o prog_cpp prog_cpp.cpp \
-pedantic -Wall -Wextra -Wconversion -Wno-sign-conversion \
-O3 -DNDEBUG -march=native
**/
#include <list>
#include <vector>
#include <algorithm>
#include <chrono>
#include <random>
#include <tuple>
#include <iostream>
template<typename Container>
bool
compareHSAndNewSet(const Container &hs,
const Container &newset)
{
for(const auto &elem: newset)
{
const auto it=std::find(cbegin(hs), cend(hs), elem);
if(it!=cend(hs))
{
return true; // found common element
}
}
return false; // no common element
}
template<typename Container>
bool
compareHSAndNewSet_dichotomic(const Container &hs,
const Container &newset)
{
for(const auto &elem: newset)
{
if(std::binary_search(cbegin(hs), cend(hs), elem))
{
return true; // found common element
}
}
return false; // no common element
}
std::tuple<std::vector<int>, // hs
std::vector<int>> // newset
prepare_vectors()
{
static auto rnd_gen=std::default_random_engine {std::random_device{}()};
constexpr auto sz=10'000;
auto distr=std::uniform_int_distribution<int>{0, 10*sz};
auto hs=std::vector<int>{};
auto newset=std::vector<int>{};
for(auto i=0; i<sz; ++i)
{
hs.emplace_back(distr(rnd_gen));
newset.emplace_back(distr(rnd_gen));
}
std::sort(begin(hs), end(hs));
return {hs, newset};
}
std::tuple<std::list<int>, // hs
std::list<int>> // newset
prepare_lists(const std::vector<int> &hs,
const std::vector<int> &newset)
{
return {std::list(cbegin(hs), cend(hs)),
std::list(cbegin(newset), cend(newset))};
}
double // seconds (1e-6 precision) since 1970/01/01 00:00:00 UTC
get_time()
{
const auto now=std::chrono::system_clock::now().time_since_epoch();
const auto us=std::chrono::duration_cast<std::chrono::microseconds>(now);
return 1e-6*double(us.count());
}
int
main()
{
constexpr auto generations=100;
constexpr auto iterations=1'000;
auto duration_v=0.0;
auto duration_vd=0.0;
auto duration_l=0.0;
auto duration_ld=0.0;
for(auto g=0; g<generations; ++g)
{
const auto [hs_v, newset_v]=prepare_vectors();
const auto [hs_l, newset_l]=prepare_lists(hs_v, newset_v);
for(auto i=-1; i<iterations; ++i)
{
const auto t0=get_time();
const auto comp_v=compareHSAndNewSet(hs_v, newset_v);
const auto t1=get_time();
const auto comp_vd=compareHSAndNewSet_dichotomic(hs_v, newset_v);
const auto t2=get_time();
const auto comp_l=compareHSAndNewSet(hs_l, newset_l);
const auto t3=get_time();
const auto comp_ld=compareHSAndNewSet_dichotomic(hs_l, newset_l);
const auto t4=get_time();
if((comp_v!=comp_vd)||(comp_v!=comp_l)||(comp_v!=comp_ld))
{
std::cerr << "comparison mismatch\n";
}
if(i>=0) // first iteration is dry-run (warmup)
{
duration_v+=t1-t0;
duration_vd+=t2-t1;
duration_l+=t3-t2;
duration_ld+=t4-t3;
}
}
}
std::cout << "vectors: " << duration_v << '\n';
std::cout << "vectors (dichotomic): " << duration_vd << '\n';
std::cout << "lists: " << duration_l << '\n';
std::cout << "lists (dichotomic): " << duration_ld << '\n';
return 0;
}
You can try sorting the list and use set_intersection.
bool compareHSAndNewSet(list<int> hs , list<int> newset){
hs.sort();
newset.sort();
list<int>::iterator i;
list<int> commonElts (hs.size()+newset.size());
i = std::set_intersection(hs.begin(), hs.end(), newset.begin(), newset.end(), commonElts.begin());
commonElts.resize(i - commonElts.begin());
return (v.size() == 0);
I'd use std::unordered_map<> to add the first list to, then check each element of the second list if it exists in the map. This would end up iterating each list once, doing length(first) insertions and length(second) lookups on the map.
std::unordered_map<> should have a lookup and insertion complexity of O(1), though worst case could end up with O(n). (I believe).

C++ - STL - Vector - Why there is no facility to indicate relocation_count in the vector

It is quite surprising that given that reserving size for a vector in anticipation helps improve the performance of the application and
ensures that costly relocations do not occur when it gets filled
to its capacity why there is no facility given to get a relocation_count
at any given time , this may very much help programmer track optimal
size to be allocated to vector in cases where the exact capacity may
need to be determined from average over period of observations as
exact figure may not be known upfront.
To count re-allocations of a std::vector, the std::vector (or at least the write access methods of it) might be wrapped into a helper class.
Sample code:
#include <iostream>
#include <vector>
template <typename VALUE>
struct AllocCounter {
std::vector<VALUE> &vec;
unsigned n;
AllocCounter(std::vector<VALUE> &vec): vec(vec), n(0) { }
void push_back(const VALUE &value)
{
size_t old = vec.capacity();
vec.push_back(value);
n += old != vec.capacity();
}
};
int main()
{
std::vector<int> values;
AllocCounter<int> countAllocs(values);
for (int i = 1; i <= 1024; ++i) {
unsigned nOld = countAllocs.n;
countAllocs.push_back(i);
if (countAllocs.n > nOld) std::cout << 'R';
std::cout << '.';
}
std::cout << '\n'
<< "Number of (re-)allocations: " << countAllocs.n << '\n';
// done
return 0;
}
Output:
R.R.R..R....R........R................R................................R................................................................R................................................................................................................................R................................................................................................................................................................................................................................................................R................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Number of (re-)allocations: 11
Live Demo on coliru
This sample is rather a proof of concept as it doesn't consider std::vector::emplace(), std::vector::resize(), etc.
Btw. if std::vector::push_back() is called directly the counting is by-passed (and may "overlook" re-allocations).
Using a custom allocator could solve this limitation.

Is list better than vector when we need to store "the last n items"?

There are a lot of questions which suggest that one should always use a vector, but it seems to me that a list would be better for the scenario, where we need to store "the last n items"
For example, say we need to store the last 5 items seen:
Iteration 0:
3,24,51,62,37,
Then at each iteration, the item at index 0 is removed, and the new item is added at the end:
Iteration 1:
24,51,62,37,8
Iteration 2:
51,62,37,8,12
It seems that for this use case, for a vector the complexity will be O(n), since we would have to copy n items, but in a list, it should be O(1), since we are always just chopping off the head, and adding to the tail each iteration.
Is my understanding correct? Is this the actual behaviour of an std::list ?
Neither. Your collection has a fixed size and std::array is sufficient.
The data structure you implement is called a ring buffer. To implement it you create an array and keep track of the offset of the current first element.
When you add an element that would push an item out of the buffer - i.e. when you remove the first element - you increment the offset.
To fetch elements in the buffer you add the index and the offset and take the modulo of this and the length of the buffer.
std::deque is a far better option. Or if you had benchmarked std::deque and found its performance to be inadequate for your specific use, you could implement a circular buffer in a fixed size array, storing the index of the start of the buffer. When replacing an element in the buffer, you would overwrite the element at the start index, and then set the start index to its previous value plus one modulo the size of the buffer.
List traversal is very slow, as list elements can be scattered throughout memory, and vector shifting is actually surprisingly fast, as memory moves on a single block of memory are quite fast even if it is a large block.
The talk Taming The Performance Beast from the Meeting C++ 2015 conference might be of interest to you.
If you can use Boost, try boost::circular_buffer:
It's a kind of sequence similar to std::list or std::deque. It supports random access iterators, constant time insert and erase operations at the beginning or the end of the buffer and interoperability with std algorithms.
It provides fixed capacity storage: when the buffer is filled, new data is written starting at the beginning of the buffer and overwriting the old
// Create a circular buffer with a capacity for 5 integers.
boost::circular_buffer<int> cb(5);
// Insert elements into the buffer.
cb.push_back(3);
cb.push_back(24);
cb.push_back(51);
cb.push_back(62);
cb.push_back(37);
int a = cb[0]; // a == 3
int b = cb[1]; // b == 24
int c = cb[2]; // c == 51
// The buffer is full now, so pushing subsequent
// elements will overwrite the front-most elements.
cb.push_back(8); // overwrite 3 with 8
cb.push_back(12); // overwrite 24 with 12
// The buffer now contains 51, 62, 37, 8, 12.
// Elements can be popped from either the front or the back.
cb.pop_back(); // 12 is removed
cb.pop_front(); // 51 is removed
The circular_buffer stores its elements in a contiguous region of memory, which then enables fast constant-time insertion, removal and random access of elements.
PS ... or implement the circular buffer directly as suggested by Taemyr.
Overload Journal #50 - Aug 2002 has a nice introduction (by Pete Goodliffe) to writing robust STL-like circular buffer.
The problem is that O(n) only talks about the asymptotic behaviour as n tends to infinity. If n is small then the constant factors involved become significant. The result is that for "last 5 integer items" I would be stunned if vector didn't beat list. I would even expect std::vector to beat std::deque.
For "last 500 integer items" I would still expect std::vector to be faster than std::list - but std::deque would now probably win. For "last 5 million slow-to-copy items", std:vector would be slowest of all.
A ring buffer based on std::array or std::vector would probably be faster still though.
As (almost) always with performance issues:
encapsulate with a fixed interface
write the simplest code that can implement that interface
if profiling shows you have a problem, optimize (which will make the code more complicated).
In practise, just using a std::deque, or a pre-built ring-buffer if you have one, will be good enough. (But it's not worth going to the trouble of writing a ring buffer unless profiling says you need to.)
Here is a minimal circular buffer. I'm primarily posting that here to get a metric ton of comments and ideas of improvement.
Minimal Implementation
#include <iterator>
template<typename Container>
class CircularBuffer
{
public:
using iterator = typename Container::iterator;
using value_type = typename Container::value_type;
private:
Container _container;
iterator _pos;
public:
CircularBuffer() : _pos(std::begin(_container)) {}
public:
value_type& operator*() const { return *_pos; }
CircularBuffer& operator++() { ++_pos ; if (_pos == std::end(_container)) _pos = std::begin(_container); return *this; }
CircularBuffer& operator--() { if (_pos == std::begin(_container)) _pos = std::end(_container); --_pos; return *this; }
};
Usage
#include <iostream>
#include <array>
int main()
{
CircularBuffer<std::array<int,5>> buf;
*buf = 1; ++buf;
*buf = 2; ++buf;
*buf = 3; ++buf;
*buf = 4; ++buf;
*buf = 5; ++buf;
std::cout << *buf << " "; ++buf;
std::cout << *buf << " "; ++buf;
std::cout << *buf << " "; ++buf;
std::cout << *buf << " "; ++buf;
std::cout << *buf << " "; ++buf;
std::cout << *buf << " "; ++buf;
std::cout << *buf << " "; ++buf;
std::cout << *buf << " "; --buf;
std::cout << *buf << " "; --buf;
std::cout << *buf << " "; --buf;
std::cout << *buf << " "; --buf;
std::cout << *buf << " "; --buf;
std::cout << *buf << " "; --buf;
std::cout << std::endl;
}
Compile with
g++ -std=c++17 -O2 -Wall -Wextra -pedantic -Werror
Demo
On Coliru: try it online
If you need to store last N-elements then logically you are doing some kind of queue or a circular buffer, std::stack and std::deque are implementations of LIFO and FIFO queues.
You can use boost::circular_buffer or implement simple circular buffer manually:
template<int Capcity>
class cbuffer
{
public:
cbuffer() : sz(0), p(0){}
void push_back(int n)
{
buf[p++] = n;
if (sz < Capcity)
sz++;
if (p >= Capcity)
p = 0;
}
int size() const
{
return sz;
}
int operator[](int n) const
{
assert(n < sz);
n = p - sz + n;
if (n < 0)
n += Capcity;
return buf[n];
}
int buf[Capcity];
int sz, p;
};
Sample use for circular buffer of 5 int elements:
int main()
{
cbuffer<5> buf;
// insert random 100 numbers
for (int i = 0; i < 100; ++i)
buf.push_back(rand());
// output to cout contents of the circular buffer
for (int i = 0; i < buf.size(); ++i)
cout << buf[i] << ' ';
}
As a note, keep in mind that when you have only 5 elements the best solution is the one that's fast to implement and works correctly.
Yes. Time complexity of the std::vector for removing elements from the end is linear. std::deque might be a good choice for what you are doing as it offers constant time insertion and removal at the beginning as well as at the end of the list and also better performance than std::list
Source:
http://www.sgi.com/tech/stl/Vector.html
http://www.sgi.com/tech/stl/Deque.html
Here are the beginnings of a ring buffer based dequeue template class that I wrote a while ago, mostly to experiment with using std::allocator (so it does not require T to be default constructible). Note it currently doesn't have iterators, or insert/remove, copy/move constructors, etc.
#ifndef RING_DEQUEUE_H
#define RING_DEQUEUE_H
#include <memory>
#include <type_traits>
#include <limits>
template <typename T, size_t N>
class ring_dequeue {
private:
static_assert(N <= std::numeric_limits<size_t>::max() / 2 &&
N <= std::numeric_limits<size_t>::max() / sizeof(T),
"size of ring_dequeue is too large");
using alloc_traits = std::allocator_traits<std::allocator<T>>;
public:
using value_type = T;
using reference = T&;
using const_reference = const T&;
using difference_type = ssize_t;
using size_type = size_t;
ring_dequeue() = default;
// Disable copy and move constructors for now - if iterators are
// implemented later, then those could be delegated to the InputIterator
// constructor below (using the std::move_iterator adaptor for the move
// constructor case).
ring_dequeue(const ring_dequeue&) = delete;
ring_dequeue(ring_dequeue&&) = delete;
ring_dequeue& operator=(const ring_dequeue&) = delete;
ring_dequeue& operator=(ring_dequeue&&) = delete;
template <typename InputIterator>
ring_dequeue(InputIterator begin, InputIterator end) {
while (m_tailIndex < N && begin != end) {
alloc_traits::construct(m_alloc, reinterpret_cast<T*>(m_buf) + m_tailIndex,
*begin);
++m_tailIndex;
++begin;
}
if (begin != end)
throw std::logic_error("Input range too long");
}
ring_dequeue(std::initializer_list<T> il) :
ring_dequeue(il.begin(), il.end()) { }
~ring_dequeue() noexcept(std::is_nothrow_destructible<T>::value) {
while (m_headIndex < m_tailIndex) {
alloc_traits::destroy(m_alloc, elemPtr(m_headIndex));
m_headIndex++;
}
}
size_t size() const {
return m_tailIndex - m_headIndex;
}
size_t max_size() const {
return N;
}
bool empty() const {
return m_headIndex == m_tailIndex;
}
bool full() const {
return m_headIndex + N == m_tailIndex;
}
template <typename... Args>
void emplace_front(Args&&... args) {
if (full())
throw std::logic_error("ring_dequeue full");
bool wasAtZero = (m_headIndex == 0);
auto newHeadIndex = wasAtZero ? (N - 1) : (m_headIndex - 1);
alloc_traits::construct(m_alloc, elemPtr(newHeadIndex),
std::forward<Args>(args)...);
m_headIndex = newHeadIndex;
if (wasAtZero)
m_tailIndex += N;
}
void push_front(const T& x) {
emplace_front(x);
}
void push_front(T&& x) {
emplace_front(std::move(x));
}
template <typename... Args>
void emplace_back(Args&&... args) {
if (full())
throw std::logic_error("ring_dequeue full");
alloc_traits::construct(m_alloc, elemPtr(m_tailIndex),
std::forward<Args>(args)...);
++m_tailIndex;
}
void push_back(const T& x) {
emplace_back(x);
}
void push_back(T&& x) {
emplace_back(std::move(x));
}
T& front() {
if (empty())
throw std::logic_error("ring_dequeue empty");
return *elemPtr(m_headIndex);
}
const T& front() const {
if (empty())
throw std::logic_error("ring_dequeue empty");
return *elemPtr(m_headIndex);
}
void remove_front() {
if (empty())
throw std::logic_error("ring_dequeue empty");
alloc_traits::destroy(m_alloc, elemPtr(m_headIndex));
++m_headIndex;
if (m_headIndex == N) {
m_headIndex = 0;
m_tailIndex -= N;
}
}
T pop_front() {
T result = std::move(front());
remove_front();
return result;
}
T& back() {
if (empty())
throw std::logic_error("ring_dequeue empty");
return *elemPtr(m_tailIndex - 1);
}
const T& back() const {
if (empty())
throw std::logic_error("ring_dequeue empty");
return *elemPtr(m_tailIndex - 1);
}
void remove_back() {
if (empty())
throw std::logic_error("ring_dequeue empty");
alloc_traits::destroy(m_alloc, elemPtr(m_tailIndex - 1));
--m_tailIndex;
}
T pop_back() {
T result = std::move(back());
remove_back();
return result;
}
private:
alignas(T) char m_buf[N * sizeof(T)];
size_t m_headIndex = 0;
size_t m_tailIndex = 0;
std::allocator<T> m_alloc;
const T* elemPtr(size_t index) const {
if (index >= N)
index -= N;
return reinterpret_cast<const T*>(m_buf) + index;
}
T* elemPtr(size_t index) {
if (index >= N)
index -= N;
return reinterpret_cast<T*>(m_buf) + index;
}
};
#endif
Briefly say the std::vector is better for a non-change size of memory.In your case,if you move all data forward or append new data in a vector,that must be a waste.As #David said the std::deque is a good option,since you would pop_head and push_back eg. two way list.
from the cplus cplus reference about the list
Compared to other base standard sequence containers (array, vector and
deque), lists perform generally better in inserting, extracting and
moving elements in any position within the container for which an
iterator has already been obtained, and therefore also in algorithms
that make intensive use of these, like sorting algorithms.
The main drawback of lists and forward_lists compared to these other
sequence containers is that they lack direct access to the elements by
their position; For example, to access the sixth element in a list,
one has to iterate from a known position (like the beginning or the
end) to that position, which takes linear time in the distance between
these. They also consume some extra memory to keep the linking
information associated to each element (which may be an important
factor for large lists of small-sized elements).
about deque
For operations that involve frequent insertion or removals of elements
at positions other than the beginning or the end, deques perform worse
and have less consistent iterators and references than lists and
forward lists.
vetor
Therefore, compared to arrays, vectors consume more memory in exchange
for the ability to manage storage and grow dynamically in an efficient
way.
Compared to the other dynamic sequence containers (deques, lists and
forward_lists), vectors are very efficient accessing its elements
(just like arrays) and relatively efficient adding or removing
elements from its end. For operations that involve inserting or
removing elements at positions other than the end, they perform worse
than the others, and have less consistent iterators and references
than lists and forward_lists.
I think even use std::deque it also have overhead of copy items in certain condition because std::deque is a map of arrays essentially, so std::list is a good idea to eliminate the copy overhead.
To increase the performance of traverse for std::list, you can implement a memory pool so that the std::list will allocate memory from a trunk and it's spatial locality for caching.

Using boost::iostreams::mapped_file_source with std::multimap

I have a rather large amount of data to analyse - each file is about 5gigs. Each file is of the following format:
xxxxx yyyyy
Both key and value can repeat, but the keys are sorted in increasing order. I'm trying to use a memory mapped file for this purpose and then find the required keys and work with them. This is what I've written:
if (data_file != "")
{
clock_start = clock();
data_file_mapped.open(data_file);
data_multimap = (std::multimap<double, unsigned int> *)data_file_mapped.data();
if (data_multimap != NULL)
{
std::multimap<double, unsigned int>::iterator it = data_multimap->find(keys_to_find[4]);
if (it != data_multimap->end())
{
std::cout << "Element found.";
for (std::multimap<double, unsigned int>::iterator it = data_multimap->lower_bound(keys_to_find[4]); it != data_multimap->upper_bound(keys_to_find[5]); ++it)
{
std::cout << it->second;
}
std::cout << "\n";
clock_end = clock();
std::cout << "Time taken to read in the file: " << (clock_end - clock_start)/CLOCKS_PER_SEC << "\n";
}
else
std::cerr << "Element not found at all" << "\n";
}
else
std::cerr << "Nope - no data received."<< "\n";
}
Basically, I need to locate ranges of keys and pull those chunks out to work on. I get a segfault the first time I try to use a method on the multimap. For example, when the find method is called. I tried the upper_bound, lower_bound and other methods too, and still get a segfault.
This is what gdb gives me:
Program received signal SIGSEGV, Segmentation fault.
_M_lower_bound (this=<optimized out>, __k=<optimized out>, __y=<optimized out>, __x=0xa31202030303833) at /usr/include/c++/4.9.2/bits/stl_tree.h:1261
1261 if (!_M_impl._M_key_compare(_S_key(__x), __k))
Could someone please point out what I'm doing wrong? I've only been able to find simplistic examples on memory mapped files - nothing like this yet. Thanks.
EDIT: More information:
The file I described above is basically a two column plain text file which a neural simulator gives me as the output of my simulations. It's simple like this:
$ du -hsc 201501271755.e.ras
4.9G 201501271755.e.ras
4.9G total
$ head 201501271755.e.ras
0.013800 0
0.013800 1
0.013800 10
0.013800 11
0.013800 12
0.013800 13
0.013800 14
0.013800 15
0.013800 16
0.013800 17
The first column is time, the second column is the neurons that fired at this time - (it's a spike time raster file). Actually, the output is a file like this from each MPI rank that is being used to run the simulation. The various files have been combined to this master file using sort -g -m. More information on the file format is here: http://www.fzenke.net/auryn/doku.php?id=manual:ras
To calculate the firing rate and other metrics of the neuron set at certain times of the simulation, I need to - locate the time in the file, pull out a chunk between [time -1,time] and run some metrics and so on on this chunk. This file is quite small and I expect the size to increase quite a bit as my simulations get more and more complicated and run for longer time periods. It's why I began looking into memory mapped files. I hope that clarifies the problem statement somewhat. I only need to read the output file to process the information it contains. I do not need to modify this file at all.
To process the data, I'll use multiple threads again, but since all my operations on the map are read-only, I don't expect to run into trouble there.
Multi maps aren't laid out sequentially in memory. (They're node-based containers, but I digress). In fact, even if they were, chances would be slim that the layout would match that of the text input.
There's basically two ways you can make this work:
Keep using the multimap but use a custom allocator (so that all allocations are done in the mapped memory region). This is the "nicest" from a high-level C++ viewpoint, /but/ you will need to change to a binary format of your file.
If you can, this is what I'd suggest. Boost Container + Boost Interprocess have everything you need to make this relatively painless.
You write a custom container "abstraction" that works directly on the mapped data. You could either
recognize a "xxxx yyyy" pair from anywhere (line ends?) or
build an index of all line starts in the file.
Using these you can devise an interator (Boost Iterator iterator_facade) that you can use to implement higher level operations (lower_bound, upper_bound and equal_range).
Once you have these, you're basically all set to query this memory map as a readonly key-value database.
Sadly, this kind of memory representation would be extremely bad for performance if you also want to support mutating operations (insert, remove).
If you have an actual sample of the file, I could do a demonstration of either of the approaches described.
Update
Quick Samples:
With boost::interprocess you can (very) simply define the multimap you desire:
namespace shared {
namespace bc = boost::container;
template <typename T> using allocator = bip::allocator<T, bip::managed_mapped_file::segment_manager>;
template <typename K, typename V>
using multi_map = bc::flat_multimap<
K, V, std::less<K>,
allocator<typename bc::flat_multimap<K, V>::value_type> >;
}
Notes:
I chose flatmap (flat_multimap, actually) because it is likely more
storage efficient, and is much more comparable to the second approach
(given below);
Note that this choice affects iterator/reference stability and will
favours read-only operations pretty heavily. If you need iterator
stability and/or many mutating operations, use a regular map (or for
very high volumes a hash_map) instead of the flat variations.
I chose a managed_mapped_file segment for this demonstration (so you get persistence). The demo shows how 10G is sparsely pre-allocated, but only the space actually allocated is used on disk. You could equally well use a managed_shared_memory.
If you have binary persistence, you might discard the text datafile altogether.
I parse the text data into a shared::multi_map<double, unsigned> from a mapped_file_source using Boost Spirit. The implementation is fully generic.
There is no need to write iterator classes, start_of_line(), end_of_line(), lower_bound(), upper_bound(), equal_range() or any of those, since they're already standard in the multi_map interface, so all we need to is write main:
Live On Coliru
#define NDEBUG
#undef DEBUG
#include <boost/iostreams/device/mapped_file.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/container/flat_map.hpp>
#include <boost/interprocess/managed_mapped_file.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
namespace bip = boost::interprocess;
namespace qi = boost::spirit::qi;
namespace shared {
namespace bc = boost::container;
template <typename T> using allocator = bip::allocator<T, bip::managed_mapped_file::segment_manager>;
template <typename K, typename V>
using multi_map = bc::flat_multimap<
K, V, std::less<K>,
allocator<typename bc::flat_multimap<K, V>::value_type> >;
}
#include <iostream>
bip::managed_mapped_file msm(bip::open_or_create, "lookup.bin", 10ul<<30);
template <typename K, typename V>
shared::multi_map<K,V>& get_or_load(const char* fname) {
using Map = shared::multi_map<K, V>;
Map* lookup = msm.find_or_construct<Map>("lookup")(msm.get_segment_manager());
if (lookup->empty()) {
// only read input file if not already loaded
boost::iostreams::mapped_file_source input(fname);
auto f(input.data()), l(f + input.size());
bool ok = qi::phrase_parse(f, l,
(qi::auto_ >> qi::auto_) % qi::eol >> *qi::eol,
qi::blank, *lookup);
if (!ok || (f!=l))
throw std::runtime_error("Error during parsing at position #" + std::to_string(f - input.data()));
}
return *lookup;
}
int main() {
// parse text file into shared memory binary representation
auto const& lookup = get_or_load<double, unsigned int>("input.txt");
auto const e = lookup.end();
for(auto&& line : lookup)
{
std::cout << line.first << "\t" << line.second << "\n";
auto er = lookup.equal_range(line.first);
if (er.first != e) std::cout << " lower: " << er.first->first << "\t" << er.first->second << "\n";
if (er.second != e) std::cout << " upper: " << er.second->first << "\t" << er.second->second << "\n";
}
}
I implemented it exactly as I described:
simple container over the raw const char* region mapped;
using boost::iterator_facade to make an iterator that parses the text on dereference;
for printing the input lines I use boost::string_ref - which avoids dynamic allocations for copying strings.
parsing is done with Spirit Qi:
if (!qi::phrase_parse(
b, _data.end,
qi::auto_ >> qi::auto_ >> qi::eoi,
qi::space,
_data.key, _data.value))
Qi was chosen for speed and genericity: you can choose the Key and Value types at instantiation time:
text_multi_lookup<double, unsigned int> tml(map.data(), map.data() + map.size());
I've implemented lower_bound, upper_bound and equal_range member functions that take advantage of underlying contiguous storage. Even though the "line" iterator is not random-access but bidirectional, we can still jump to the mid_point of such an iterator range because we can get the start_of_line from any const char* into the underlying mapped region. This make binary searching efficient.
Note that this solution parses lines on dereference of the iterator. This might not be efficient if the same lines are dereferenced a lot of times.
But, for infrequent lookups, or lookups that are not typical in the same region of the input data, this is about as efficient as it can possibly get (doing only minimum required parsing and O(log n) binary searching), all the while completely bypassing the initial load time by mapping the file instead (no access means nothing needs to be loaded).
Live On Coliru (including test data)
#define NDEBUG
#undef DEBUG
#include <boost/iostreams/device/mapped_file.hpp>
#include <boost/utility/string_ref.hpp>
#include <boost/optional.hpp>
#include <boost/spirit/include/qi.hpp>
#include <thread>
#include <iomanip>
namespace io = boost::iostreams;
namespace qi = boost::spirit::qi;
template <typename Key, typename Value>
struct text_multi_lookup {
text_multi_lookup(char const* begin, char const* end)
: _map_begin(begin),
_map_end(end)
{
}
private:
friend struct iterator;
enum : char { nl = '\n' };
using rawit = char const*;
rawit _map_begin, _map_end;
rawit start_of_line(rawit it) const {
while (it > _map_begin) if (*--it == nl) return it+1;
assert(it == _map_begin);
return it;
}
rawit end_of_line(rawit it) const {
while (it < _map_end) if (*it++ == nl) return it;
assert(it == _map_end);
return it;
}
public:
struct value_type final {
rawit beg, end;
Key key;
Value value;
boost::string_ref str() const { return { beg, size_t(end-beg) }; }
};
struct iterator : boost::iterator_facade<iterator, boost::string_ref, boost::bidirectional_traversal_tag, value_type> {
iterator(text_multi_lookup const& d, rawit it) : _region(&d), _data { it, nullptr, Key{}, Value{} } {
assert(_data.beg == _region->start_of_line(_data.beg));
}
private:
friend text_multi_lookup;
text_multi_lookup const* _region;
value_type mutable _data;
void ensure_parsed() const {
if (!_data.end)
{
assert(_data.beg == _region->start_of_line(_data.beg));
auto b = _data.beg;
_data.end = _region->end_of_line(_data.beg);
if (!qi::phrase_parse(
b, _data.end,
qi::auto_ >> qi::auto_ >> qi::eoi,
qi::space,
_data.key, _data.value))
{
std::cerr << "Problem in: " << std::string(_data.beg, _data.end)
<< "at: " << std::setw(_data.end-_data.beg) << std::right << std::string(_data.beg,_data.end);
assert(false);
}
}
}
static iterator mid_point(iterator const& a, iterator const& b) {
assert(a._region == b._region);
return { *a._region, a._region->start_of_line(a._data.beg + (b._data.beg -a._data.beg)/2) };
}
public:
value_type const& dereference() const {
ensure_parsed();
return _data;
}
bool equal(iterator const& o) const {
return (_region == o._region) && (_data.beg == o._data.beg);
}
void increment() {
_data = { _region->end_of_line(_data.beg), nullptr, Key{}, Value{} };
assert(_data.beg == _region->start_of_line(_data.beg));
}
};
using const_iterator = iterator;
const_iterator begin() const { return { *this, _map_begin }; }
const_iterator end() const { return { *this, _map_end }; }
const_iterator cbegin() const { return { *this, _map_begin }; }
const_iterator cend() const { return { *this, _map_end }; }
template <typename CompatibleKey>
const_iterator lower_bound(CompatibleKey const& key) const {
auto f(begin()), l(end());
while (f!=l) {
auto m = iterator::mid_point(f,l);
if (m->key < key) {
f = m;
++f;
}
else {
l = m;
}
}
return f;
}
template <typename CompatibleKey>
const_iterator upper_bound(CompatibleKey const& key) const {
return upper_bound(key, begin());
}
private:
template <typename CompatibleKey>
const_iterator upper_bound(CompatibleKey const& key, const_iterator f) const {
auto l(end());
while (f!=l) {
auto m = iterator::mid_point(f,l);
if (key < m->key) {
l = m;
}
else {
f = m;
++f;
}
}
return f;
}
public:
template <typename CompatibleKey>
std::pair<const_iterator, const_iterator> equal_range(CompatibleKey const& key) const {
auto lb = lower_bound(key);
return { lb, upper_bound(key, lb) };
}
};
#include <iostream>
int main() {
io::mapped_file_source map("input.txt");
text_multi_lookup<double, unsigned int> tml(map.data(), map.data() + map.size());
auto const e = tml.end();
for(auto&& line : tml)
{
std::cout << line.str();
auto er = tml.equal_range(line.key);
if (er.first != e) std::cout << " lower: " << er.first->str();
if (er.second != e) std::cout << " upper: " << er.second->str();
}
}
For the curious: here's the disassembly. Note how all the algorithmic stuff is inlined right into main: http://paste.ubuntu.com/9946135/
data_multimap = (std::multimap<double, unsigned int> *)data_file_mapped.data();, as far I can read from the boost documentation, you have missunderstood that function, that casting will not work, you need to fill the the multimap with the char* provided by data()
I edit to add a bit more detailed content, for example after the mapping, you can do
std::getline(data_file_mapped, oneString);
And after that, deliver the content on the line (you can use a stringstream for that task) and fill your multimap.
Repeat the process until the end of the file.

Merging two lists efficiently with limited bound

I am trying to merge two arrays/lists where each element of the array has to be compared. If there is an identical element in both of them I increase their total occurrence by one. The arrays are both 2D, where each element has a counter for its occurrence. I know both of these arrays can be compared with a double for loop in O(n^2), however I am limited by a bound of O(nlogn). The final array will have all of the elements from both lists with their increased counters if there are more than one occurrence
Array A[][] = [[8,1],[5,1]]
Array B[][] = [[2,1],[8,1]]
After the merge is complete I should get an array like so
Array C[][] = [[2,1],[8,2],[8,2],[5,1]]
The arrangement of the elements does not have to be necessary.
From readings, Mergesort takes O(nlogn) to merge two lists however I am currently at a roadblock with my bound problem. Any pseudo code visual would be appreciated.
I quite like Stepanov's Efficient Programming although they are rather slow. In sessions 6 and 7 (if I recall correctly) he discusses the algorithms add_to_counter() and reduce_counter(). Both algorithms are entirely trivial, of course, but can be used to implement a non-recursive merge-sort without too much effort. The only possibly non-obvious insight is that the combining operation can reduce the two elements into a sequence rather than just one element. To do the operations in-place you'd actually store iterators (i.e., pointers in case of arrays) using a suitable class to represent a partial view of an array.
I haven't watched the sessions beyond session 7 (and actually not even the complete session 7, yet) but I would fully expect that he actually presents how to use the counter produced in session 7 to implement, e.g., merge-sort. Of course, the run-time complexity of merge-sort is O(n ln n) and, when using the counter approach it will use O(ln n) auxiliary space.
A simple algorithm that requires twice as much memory would be to order both inputs (O(n log n)) and then sequentially pick the elements from the head of both lists and do the merge (O(n)). The overall cost would be O(n log n) with O(n) extra memory (additional size of the smallest of both inputs)
Here's my algorithm based on bucket counting
time complexity: O(n)
memory complexity: O(max), where max is the maximum element in the arrays
Output:
[8,2][5,1][2,1][8,2]
Code:
#include <iostream>
#include <vector>
#include <iterator>
int &refreshCount(std::vector<int> &counters, int in) {
if((counters.size() - 1) < in) {
counters.resize(in + 1);
}
return ++counters[in];
}
void copyWithCounts(std::vector<std::pair<int, int> >::iterator it,
std::vector<std::pair<int, int> >::iterator end,
std::vector<int> &counters,
std::vector<std::pair<int, int&> > &result
) {
while(it != end) {
int &count = refreshCount(counters, (*it).first);
std::pair<int, int&> element((*it).first, count);
result.push_back(element);
++it;
}
}
void countingMerge(std::vector<std::pair<int, int> > &array1,
std::vector<std::pair<int, int> > &array2,
std::vector<std::pair<int, int&> > &result) {
auto array1It = array1.begin();
auto array1End = array1.end();
auto array2It = array2.begin();
auto array2End = array2.end();
std::vector<int> counters = {0};
copyWithCounts(array1It, array1End, counters, result);
copyWithCounts(array2It, array2End, counters, result);
}
int main()
{
std::vector<std::pair<int, int> > array1 = {{8, 1}, {5, 1}};
std::vector<std::pair<int, int> > array2 = {{2, 1}, {8, 1}};
std::vector<std::pair<int, int&> > result;
countingMerge(array1, array2, result);
for(auto it = result.begin(); it != result.end(); ++it) {
std::cout << "[" << (*it).first << "," << (*it).second << "] ";
}
return 0;
}
Short explanation:
because you mentioned, that final arrangement is not necessary, I did simple merge (without sort, who asked sort?) with counting, where result contains reference to counters, so no need to walk through the array to update the counters.
You could write an algorithm to merge them by walking both sequences sequentially in order, inserting where appropriate.
I've chosen a (seemingly more apt) datastructure here: std::map<Value, Occurence>:
#include <map>
using namespace std;
using Value = int;
using Occurence = unsigned;
using Histo = map<Value, Occurence>;
If you insist on contiguous storage, boost::flat_map<> should be your friend here (and a drop-in replacement).
The algorithm (tested with your inputs, read comments for explanation):
void MergeInto(Histo& target, Histo const& other)
{
auto left_it = begin(target), left_end = end(target);
auto right_it = begin(other), right_end = end(other);
auto const& cmp = target.value_comp();
while (right_it != right_end)
{
if ((left_it == left_end) || cmp(*right_it, *left_it))
{
// insert at left_it
target.insert(left_it, *right_it);
++right_it; // and carry on
} else if (cmp(*left_it, *right_it))
{
++left_it; // keep left_it first, so increment it
} else
{
// keys match!
left_it->second += right_it->second;
++left_it;
++right_it;
}
}
}
It's really quite straight-forward!
A test program: See it Live On Coliru
#include <iostream>
// for debug output
static inline std::ostream& operator<<(std::ostream& os, Histo::value_type const& v) { return os << "{" << v.first << "," << v.second << "}"; }
static inline std::ostream& operator<<(std::ostream& os, Histo const& v) { for (auto& el : v) os << el << " "; return os; }
//
int main(int argc, char *argv[])
{
Histo A { { 8, 1 }, { 5, 1 } };
Histo B { { 2, 1 }, { 8, 1 } };
std::cout << "A: " << A << "\n";
std::cout << "B: " << B << "\n";
MergeInto(A, B);
std::cout << "merged: " << A << "\n";
}
Printing:
A: {5,1} {8,1}
B: {2,1} {8,1}
merged: {2,1} {5,1} {8,2}
You could shuffle the interface a tiny bit in case you really wanted to merge into a new object (C):
// convenience
Histo Merge(Histo const& left, Histo const& right)
{
auto copy(left);
MergeInto(copy, right);
return copy;
}
Now you can just write
Histo A { { 8, 1 }, { 5, 1 } };
Histo B { { 2, 1 }, { 8, 1 } };
auto C = Merge(A, B);
See that Live on Coliru, too