I'm trying to implement a ring buffer (or circular buffer). As with most of these implementations it should be as fast and lightweight as possible but still provide enough safety to be robust enough for production use. This is a difficult balance to strike. In particular I'm faced with the following problem.
I want to use said buffer to store the last n system events. As new events come in the oldest get deleted. Other parts of my software can then access those stored events and process them at their own pace. Some systems might consume events almost as fast as they arrive, others may only check sporadically. Each system would store an iterator into the buffer so that they know where they left off last time they checked. This is no problem as long they check often enough but especially the slower systems may oftentimes find themselves with an old iterator that points to a buffer element that has since been overwritten without a way to detect that.
Is there a good (not too costly) way of checking whether any given iterator is still valid?
Things I came up with so far:
keep a list of all iterators and store their valid state (rather costly)
store not only the iterator in the calling systems but also a copy of the pointed-to element in the client of the buffer. On each access, check whether the element is still the same. This can be unreliable. If the element has been overwritten by an identical element it is impossible to check whether it has changed or not. Also, the responsibility of finding a good way to check elements lies with the client, which is not ideal in my mind.
Many ring buffer implementations don't bother with this at all or use a single-read-single-write idiom, where reading is deleting.
Instead of storing values, store (value, sequence_num) pairs. When you push a new value, always make sure that it uses a different sequence_num. You can use a monotonically increasing integer for sequence_num.
Then, the iterator remembers the sequence_num of the element that it was last looking at. If it doesn't match, it's been overwritten.
I agree with Roger Lipscombe, use sequence numbers.
But you don't need to store (value, sequence_num) pairs: just store the values, and keep track of the highest sequence number so far. Since it's a ring buffer, you can deduce the seq num for all entries.
Thus, the iterators consist simply of a sequence number.
Given Obj the type of object you store in your ring buffer, if you use a simple array, your ring buffer would look like this:
struct RingBuffer {
Obj buf[ RINGBUFFER_SIZE ] ;
size_t idx_last_element ;
uint32_t seqnum_last_element ;
void Append( const Obj& obj ) { // TODO: Locking needed if multithreaded
if ( idx_last_element == RINGBUFFER_SIZE - 1 )
idx_last_element = 0 ;
else
++idx_last_element ;
buf[ idx_last_element ] = obj ; // copy.
++ seqnum_last_element ;
}
}
And the iterator would look like this:
struct RingBufferIterator {
const RingBuffer* ringbuf ;
uint32_t seqnum ;
bool IsValid() {
return ringbuf &&
seqnum <= ringbuf->seqnum_last_element &&
seqnum > ringbuf->seqnum_last_element - RINGBUFFER_SIZE ; //TODO: handle seqnum rollover.
}
Obj* ToPointer() {
if ( ! IsValid() ) return NULL ;
size_t idx = ringbuf->idx_last_element - (ringbuf->seqnum_last_element-seqnum) ; //TODO: handle seqnum rollover.
// handle wrap around:
if ( idx < 0 ) return ringbuf->buf + RINGBUFFER_SIZE- idx ;
return ringbuf->buf + idx ;
}
}
A variation of Roger Lipscombe's answer, is to use a sequence number as the iterator. The sequence number should be monotonically increasing (take special care of when your integer type overflows) with a fixed step (1 eg.).
The circular buffer itself would store the data as normal, and would keep track of the oldest sequence number it currently contains (at the tail position).
When dereferencing the iterator, the iterator's sequence number is checked against the buffer's oldest sequence number. If it's bigger or equal (again take special care of integer overflow), the data can be retrieved using a simple index calculation. If it's smaller, it means the data has been overwritten, and the current tail data should be retrieved instead (updating the iterator's sequence number accordingly).
Related
I want to sort an array with huge(millions or even billions) elements, while the values are integers within a small range(1 to 100 or 1 to 1000), in such a case, is std::sort and the parallelized version __gnu_parallel::sort the best choice for me?
actually I want to sort a vecotor of my own class with an integer member representing the processor index.
as there are other member inside the class, so, even if two data have same integer member that is used for comparing, they might not be regarded as same data.
Counting sort would be the right choice if you know that your range is so limited. If the range is [0,m) the most efficient way to do so it have a vector in which the index represent the element and the value the count. For example:
vector<int> to_sort;
vector<int> counts;
for (int i : to_sort) {
if (counts.size() < i) {
counts.resize(i+1, 0);
}
counts[i]++;
}
Note that the count at i is lazily initialized but you can resize once if you know m.
If you are sorting objects by some field and they are all distinct, you can modify the above as:
vector<T> to_sort;
vector<vector<const T*>> count_sorted;
for (const T& t : to_sort) {
const int i = t.sort_field()
if (count_sorted.size() < i) {
count_sorted.resize(i+1, {});
}
count_sorted[i].push_back(&t);
}
Now the main difference is that your space requirements grow substantially because you need to store the vectors of pointers. The space complexity went from O(m) to O(n). Time complexity is the same. Note that the algorithm is stable. The code above assumes that to_sort is in scope during the life cycle of count_sorted. If your Ts implement move semantics you can store the object themselves and move them in. If you need count_sorted to outlive to_sort you will need to do so or make copies.
If you have a range of type [-l, m), the substance does not change much, but your index now represents the value i + l and you need to know l beforehand.
Finally, it should be trivial to simulate an iteration through the sorted array by iterating through the counts array taking into account the value of the count. If you want stl like iterators you might need a custom data structure that encapsulates that behavior.
Note: in the previous version of this answer I mentioned multiset as a way to use a data structure to count sort. This would be efficient in some java implementations (I believe the Guava implementation would be efficient) but not in C++ where the keys in the RB tree are just repeated many times.
You say "in-place", I therefore assume that you don't want to use O(n) extra memory.
First, count the number of objects with each value (as in Gionvanni's and ronaldo's answers). You still need to get the objects into the right locations in-place. I think the following works, but I haven't implemented or tested it:
Create a cumulative sum from your counts, so that you know what index each object needs to go to. For example, if the counts are 1: 3, 2: 5, 3: 7, then the cumulative sums are 1: 0, 2: 3, 3: 8, 4: 15, meaning that the first object with value 1 in the final array will be at index 0, the first object with value 2 will be at index 3, and so on.
The basic idea now is to go through the vector, starting from the beginning. Get the element's processor index, and look up the corresponding cumulative sum. This is where you want it to be. If it's already in that location, move on to the next element of the vector and increment the cumulative sum (so that the next object with that value goes in the next position along). If it's not already in the right location, swap it with the correct location, increment the cumulative sum, and then continue the process for the element you swapped into this position in the vector.
There's a potential problem when you reach the start of a block of elements that have already been moved into place. You can solve that by remembering the original cumulative sums, "noticing" when you reach one, and jump ahead to the current cumulative sum for that value, so that you don't revisit any elements that you've already swapped into place. There might be a cleverer way to deal with this, but I don't know it.
Finally, compare the performance (and correctness!) of your code against std::sort. This has better time complexity than std::sort, but that doesn't mean it's necessarily faster for your actual data.
You definitely want to use counting sort. But not the one you're thinking of. Its main selling point is that its time complexity is O(N+X) where X is the maximum value you allow the sorting of.
Regular old counting sort (as seen on some other answers) can only sort integers, or has to be implemented with a multiset or some other data structure (becoming O(Nlog(N))). But a more general version of counting sort can be used to sort (in place) anything that can provide an integer key, which is perfectly suited to your use case.
The algorithm is somewhat different though, and it's also known as American Flag Sort. Just like regular counting sort, it starts off by calculating the counts.
After that, it builds a prefix sums array of the counts. This is so that we can know how many elements should be placed behind a particular item, thus allowing us to index into the right place in constant time.
since we know the correct final position of the items, we can just swap them into place. And doing just that would work if there weren't any repetitions but, since it's almost certain that there will be repetitions, we have to be more careful.
First: when we put something into its place we have to increment the value in the prefix sum so that the next element with same value doesn't remove the previous element from its place.
Second: either
keep track of how many elements of each value we have already put into place so that we dont keep moving elements of values that have already reached their place, this requires a second copy of the counts array (prior to calculating the prefix sum), as well as a "move count" array.
keep a copy of the prefix sums shifted over by one so that we stop moving elements once the stored position of the latest element
reaches the first position of the next value.
Even though the first approach is somewhat more intuitive, I chose the second method (because it's faster and uses less memory).
template<class It, class KeyOf>
void countsort (It begin, It end, KeyOf key_of) {
constexpr int max_value = 1000;
int final_destination[max_value] = {}; // zero initialized
int destination[max_value] = {}; // zero initialized
// Record counts
for (It it = begin; it != end; ++it)
final_destination[key_of(*it)]++;
// Build prefix sum of counts
for (int i = 1; i < max_value; ++i) {
final_destination[i] += final_destination[i-1];
destination[i] = final_destination[i-1];
}
for (auto it = begin; it != end; ++it) {
auto key = key_of(*it);
// while item is not in the correct position
while ( std::distance(begin, it) != destination[key] &&
// and not all items of this value have reached their final position
final_destination[key] != destination[key] ) {
// swap into the right place
std::iter_swap(it, begin + destination[key]);
// tidy up for next iteration
++destination[key];
key = key_of(*it);
}
}
}
Usage:
vector<Person> records = populateRecords();
countsort(records.begin(), records.end(), [](Person const &){
return Person.id()-1; // map [1, 1000] -> [0, 1000)
});
This can be further generalized to become MSD Radix Sort,
here's a talk by Malte Skarupke about it: https://www.youtube.com/watch?v=zqs87a_7zxw
Here's a neat visualization of the algorithm: https://www.youtube.com/watch?v=k1XkZ5ANO64
The answer given by Giovanni Botta is perfect, and Counting Sort is definitely the way to go. However, I personally prefer not to go resizing the vector progressively, but I'd rather do it this way (assuming your range is [0-1000]):
vector<int> to_sort;
vector<int> counts(1001);
int maxvalue=0;
for (int i : to_sort) {
if(i > maxvalue) maxvalue = i;
counts[i]++;
}
counts.resize(maxvalue+1);
It is essentially the same, but no need to be constantly managing the size of the counts vector. Depending on your memory constraints, you could use one solution or the other.
Can I use popFront() and then eventually push back what was poped? The number of calls to popFront() might be more than one (but not much greater than it, say < 10, if does matter). This is also the number of calls which the imaginary pushBack() function will be called too.
for example:
string s = "Hello, World!";
int n = 5;
foreach(i; 0 .. n) {
// do something with s.front
s.popFront();
}
if(some_condition) {
foreach(i; 0 .. n) {
s.pushBack();
}
}
writeln(s); // should output "Hello, World!" since number of poped is same as pushed back.
I think popFront() does use .ptr but I'm not sure if it in D does makes any difference and can help anyway to reach my goal easily (i.e, in D's way and not write my own with a Circular buffer or so).
A completely different approach to reach it is very welcome too.
A range is either generative (e.g. if it's a list of random numbers), or it's a view into a container. In neither case does it make sense to push anything onto it. As you call popFront, you're iterating through the list and shrinking your view of the container. If you think of a range being like two C++ iterators for a moment, and you have something like
struct IterRange(T)
{
#property bool empty() { return iter == end; }
#property T front() { return *iter; }
void popFront() { ++iter; }
private Iterator iter;
private Iterator end;
}
then it will be easier to understand. If you called popFront, it would move the iterator forward by one, thereby changing which element you're looking at, but you can't add elements in front of it. That would require doing something like an insertion on the container itself, and maybe the iterator or range could be used to tell the container where you want an alement inserted, but the iterator or range can't do that itself. The same goes if you have a generative range like
struct IncRange(T)
{
#property bool empty() { value == T.max; }
#property T front() { return value; }
void popFront() { ++value; }
private T value;
}
It keeps incrementing the value, and there is no container backing it. So, it doesn't even have anywhere that you could push a value onto.
Arrays are a little bit funny because they're ranges but they're also containers (sort of). They have range semantics when popping elements off of them or slicing them, but they don't own their own memory, and once you append to them, you can get a completely different chunk of memory with the same values. So, it is sort of a range that you can add and remove elements from - but you can't do it using the range API. So, you could do something like
str = newChar ~ str;
but that's not terribly efficient. You could make it more efficient by creating a new array at the target size and then filling in its elements rather than concatenating repeatedly, but regardless, pushing something on the the front of an array is not a particularly idiomatic or efficient thing to be doing.
Now, if what you're looking to do is just reset the range so that it once again refers to the elements that were popped off rather than really push elements onto it - that is, open up the window again so that it shows what it showed before - that's a bit different. It's still not supported by the range API at all (you can never unpop anything that was popped off). However, if the range that you're dealing with is a forward range (and arrays are), then you can save the range before you pop off the elements and then use that to restore the previous state. e.g.
string s = "Hello, World!";
int n = 5;
auto saved = s.save;
foreach(i; 0 .. n)
s.popFront();
if(some_condition)
s = saved;
So, you have to explicitly store the previous state yourself in order to restore it instead of having something like unpopFront, but having the range store that itself (as would be required for unpopFront) would be very inefficient in most cases (much is it might work in the iterator case if the range kept track of where the beginning of the container was).
No, there is no standard way to "unpop" a range or a string.
If you were to pass a slice of a string to a function:
fun(s[5..10]);
You'd expect that that function would only be able to see those 5 characters. If there was a way to "unpop" the slice, the function would be able to see the entire string.
Now, D is a system programming language, so expanding a slice is possible using pointer arithmetic and GC queries. But there is nothing in the standard library to do this for you.
What is considered an optimal data structure for pushing something in order (so inserts at any position, able to find correct position), in-order iteration, and popping N elements off the top (so the N smallest elements, N determined by comparisons with threshold value)? The push and pop need to be particularly fast (run every iteration of a loop), while the in-order full iteration of the data happens at a variable rate but likely an order of magnitude less often. The data can't be purged by the full iteration, it needs to be unchanged. Everything that is pushed will eventually be popped, but since a pop can remove multiple elements there can be more pushes than pops. The scale of data in the structure at any one time could go up to hundreds or low thousands of elements.
I'm currently using a std::deque and binary search to insert elements in ascending order. Profiling shows it taking up the majority of the time, so something has got to change. std::priority_queue doesn't allow iteration, and hacks I've seen to do it won't iterate in order. Even on a limited test (no full iteration!), the std::set class performed worse than my std::deque approach.
None of the classes I'm messing with seem to be built with this use case in mind. I'm not averse to making my own class, if there's a data structure not to be found in STL or boost for some reason.
edit:
There's two major functions right now, push and prune. push uses 65% of the time, prune uses 32%. Most of the time used in push is due to insertion into the deque (64% out of 65%). Only 1% comes from the binary search to find the position.
template<typename T, size_t Axes>
void Splitter<T, Axes>::SortedData::push(const Data& data) //65% of processing
{
size_t index = find(data.values[(axis * 2) + 1]);
this->data.insert(this->data.begin() + index, data); //64% of all processing happens here
}
template<typename T, size_t Axes>
void Splitter<T, Axes>::SortedData::prune(T value) //32% of processing
{
auto top = data.begin(), end = data.end(), it = top;
for (; it != end; ++it)
{
Data& data = *it;
if (data.values[(axis * 2) + 1] > value) break;
}
data.erase(top, it);
}
template<typename T, size_t Axes>
size_t Splitter<T, Axes>::SortedData::find(T value)
{
size_t start = 0;
size_t end = this->data.size();
if (!end) return 0;
size_t diff;
while (diff = (end - start) >> 1)
{
size_t mid = diff + start;
if (this->data[mid].values[(axis * 2) + 1] <= value)
{
start = mid;
}
else
{
end = mid;
}
}
return this->data[start].values[(axis * 2) + 1] <= value ? end : start;
}
With your requirements, a hybrid data-structure tailored to your needs will probably perform best. As others have said, continuous memory is very important, but I would not recommend keeping the array sorted at all times. I propose you use 3 buffers (1 std::array and 2 std::vectors):
1 (constant-size) Buffer for the "insertion heap". Needs to fit into the cache.
2 (variable-sized) Buffers (A+B) to maintain and update sorted arrays.
When you push an element, you add it to the insertion heap via std::push_heap. Since the insertion heap is constant size, it can overflow. When that happens, you std::sort it backwards and std::merge it with the already sorted-sequence buffer (A) into the third (B), resizing them as needed. That will be the new sorted buffer and the old one can be discarded, i.e. you swap A and B for the next bulk operation. When you need the sorted sequence for iteration, you do the same. When you remove elements, you compare the top element in the heap with the last element in the sorted sequence and remove that (which is why you sort it backwards, so that you can pop_back instead of pop_front).
For reference, this idea is loosely based on sequence heaps.
Have you tried messing around with std::vector? As weird as it may sound it could be actually pretty fast because it uses continuous memory. If I remember correctly Bjarne Stroustrup was talking about this at Going Native 2012 (http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Keynote-Bjarne-Stroustrup-Cpp11-Style but I'm not 100% sure that it's in this video).
You save time with the binary search, but the insertion in random positions of the deque is slow. I would suggest an std::map instead.
From your edit, it sounds like the delay is in copying - is it a complex object? Can you heap allocate and store pointers in the structure so each entry is created once only; you'll need to provide a custom comparitor that takes pointers, as the objects operator<() wouldn't be called. (The custom comparitor can simply call operator<())
EDIT:
Your own figures show it's the insertion that takes the time, not the 'sorting'. While some of that insertion time is creating a copy of your object, some (possibly most) is creation of the internal structure that will hold your object - and I don't think that will change between list/map/set/queue etc. IF you can predict the likely eventual/maximum size of your data set, and can write or find your own sorting algorithm, and the time is being lost in allocating objects, then vector might be the way to go.
I discovered earlier today that the iterators of a boost::circular buffer were not behaving quite as I expected them to in a multi-threaded environment. (although to be fair they behave differently than I would think in a single threaded program too).
if you make calls to buffer.begin() and buffer.end() to represent iterators to use to loop over a specific seqment of data. The value of the end iterator changes if more data is added to the circular_buffer. Obviously you would expect to get a different result if you made another call to end() after the data was changed. But what is confusing is the value of the iterator object you already made changing.
Is there a way to create iterators that allow you to work on a set range of data in a circular_buffer and do not have their values changed even if additional data is added to the buffer while working on a range?
If not, is there a preferred 'non-iterator' pattern that might apply, or a different container class that might allow this?
#include "stdafx.h"
#include <boost\thread.hpp>
#include <boost\thread\thread_time.hpp>
#include <boost\circular_buffer.hpp>
int _tmain(int argc, _TCHAR* argv[])
{
boost::circular_buffer<int> buffer(20);
buffer.push_back(99);
buffer.push_back(99);
buffer.push_back(99);
boost::circular_buffer<int>::const_iterator itBegin = buffer.begin();
boost::circular_buffer<int>::const_iterator itEnd = buffer.end();
int count = itEnd - itBegin;
printf("itEnd - itBegin == %i\n", count); //prints 3
/* another thread (or this one) pushes an item*/
buffer.push_back(99);
/*we check values of begin and end again, they've changed, even though we have not done anything in this thread*/
count = itEnd - itBegin;
printf("itEnd - itBegin == %i\n", count); //prints 4
}
Update for more detailed response to ronag
I am looking for a producer consumer type model, and the bounded buffer example shown in the boost documentation is CLOSE to what I need. with the following two exceptions.
I need to be able to read more than one element of data off the buffer at a time, inspect them, which may not be trivial, and then choose how many items to to remove from the buffer.
fake example: trying to process the two words hello & world from a string of characters.
read 1, buffer contains h-e-l, do not remove chars from the buffer.
-more producing
read 2, buffer contains h-e-l-l-o-w-o, I found 'hello' remove 5 chars, buffer now has w-o
-more producing
read 3, buffer contains w-o-r-l-d, process 'world', remove another 5 characters.
I need to not block the producer thread when reading, unless the buffer is full.
According to the boost::circular_buffer::iterator docs, your iterators should remain valid. (Always the first thing I check when mutating and iterating a container at the same time.) So your example code is legal.
What is happening is due to STL iterator convention: end() doesn't point at an element, but rather to the imaginary one-past-the-last element. After the fourth push_back, the distance from itBegin (the first element) to itEnd (one-past-the-last element) has increased.
One solution may be to hold an iterator pointing to a concrete element, e.g.
itPenultimate = itEnd - 1;
Now the distance from itBegin to itPenultimate will remain the same even when the circular buffer is extended, as long as it's not in the range.
The doc explicitly states that circular buffer is not thread-safe and that the user is responsible for locking the data structure. That would remedy your problem.
But maybe a producer-consumer style queue is better suited for your problem.
How do u push Items to the front of the array, ( like a stack ) without starting at MAXSIZE-1? I've been trying to use the modulus operator to do so..
bool quack::pushFront(const int nPushFront)
{
if ( count == maxSize ) // indicates a full array
{
return false;
}
else if ( count == 0 )
{
++count;
items[0].n = nPushFront;
return true;
}
intBack = intFront;
items[++intBack] = items[intFront];
++count;
items[(top+(count)+maxSize)%maxSize].n = nPushFront;
/*
for ( int shift = count - 1; shift >= 0; --shift )
{
items[shift] = i€tems[shift-1];
}
items[top+1].n = nPushFront; */
return true;
}
"quack" meaning a cross between a queue and a stack. I cannot simply shift my elements by 1 because it is terribly inefficient. I've been working on this for over a month now. I just need some guidence to push_front by using the modulus operator...I dont think a loop is even necessary.
Its funny because I will need to print the list randomly. So if I start adding values to the MAXSIZE-1 element of my integer array, and then need to print the array, I will have garbage values..
not actual code:
pushFront(2);
pushFront(4);
cout << q;
if we started adding from the back i would get several null values.
I cannot just simply shift the array elements down or up by one.
I cant use any stls, or boosts.
Not sure what your problem is. Are you trying to implement a queue (which also can work as a stack, no need for your quack) as a ring buffer?
In that case, you need to save both a front and a back index. The mechanics are described in the article linked above. Pay attention to the “Difficulties” section: in particular, you need to either have an extra variable or pay attention to leave one field empty – otherwise you won’t know how to differentiate between a completely empty and a completely full queue.
Well, it seems kind of silly to rule out the stl, since std::deque is exactly what you want. Amortized constant time random access. Amortized constant insert/removal time from both the front and the back.
This can be achieved with an array with extra space at the beginning and end. When you run out of space at either end, allocate a new array with twice the space and copy everything over, again with space at both the end and the beginning. You need to keep track of the beginning index and the end index in your class.
It seems to me that you have some conflicting requirements:
You have to push to the head of a C++ array primitive.
Without shifting all of the existing elements.
Maintain insertion order.
Short answer: You can't do it, as the above requirements are mutually exclusive.
One of these requirements has to be relaxed.
To help you without having to guess, we need more information about what you are trying to do.