Using STL vector as a FIFO container for byte data - c++

I have a thread running that reads a stream of bytes from a serial port. It does this continuously in the background, and reads from the stream come in at different, separate times. I store the data in a container like so:
using ByteVector = std::vector<std::uint8_t>;
ByteVector receive_queue;
When data comes in from the serial port, I append it to the end of the byte queue:
ByteVector read_bytes = serial_port->ReadBytes(100); // read 100 bytes; returns as a "ByteVector"
receive_queue.insert(receive_queue.end(), read_bytes.begin(), read_bytes.end());
When I am ready to read data in the receive queue, I remove it from the front:
unsigned read_bytes = 100;
// Read 100 bytes from the front of the vector by using indices or iterators, then:
receive_queue.erase(receive_queue.begin(), receive_queue.begin() + read_bytes);
This isn't the full code, but gives a good idea of how I'm utilizing the vector for this data streaming mechanism.
My main concern with this implementation is the removal from the front, which requires shifting each element removed (I'm not sure how optimized erase() is for vector, but in the worst case, each element removal results in a shift of the entire vector). On the flip side, vectors are candidates for CPU cache locality because of the contiguous nature of the data (but CPU cache usage is not guaranteed).
I've thought of maybe using boost::circular_buffer, but I'm not sure if it's the right tool for the job.
I have not yet coded an upper-limit for the growth of the receive queue, however I could easily do a reserve(MAX_RECEIVE_BYTES) somewhere, and make sure that size() is never greater than MAX_RECEIVE_BYTES as I continue to append to the back of it.
Is this approach generally OK? If not, what performance concerns are there? What container would be more appropriate here?

Erasing a from the front of a vector an element at the time can be quite slow, especially if the buffer is large (unless you can reorder elements, which you cannot with a FIFO queue).
A circular buffer is an excellent, perhaps ideal data structure for a FIFO queue of fixed size. But there is no implementation in the standard library. You'll have to implement it yourself or use a third party implementation such as the Boost one you've discovered.
The standard library provides a high level structure for a growing FIFO queue: std::queue. For a lower level data structure, the double ended queue is a good choice (std::deque, which is the default underlying container of std::queue).
On the flip side, vectors are candidates for CPU cache locality because of the contiguous nature of the data (but this is not guaranteed).
The continuous storage of std::vector is guaranteed. A fixed circular buffer also has continuous storage.
I'm not sure what is guaranteed about cache locality of std::deque but it is typically quite good in practice as the typical implementation is a linked list of arrays.

Performance will be poor, which may or may not matter. Taking from the head entails a cascade of moves. But STL has queues for exactly this purpose, just use one.

Related

c++ paged byte queue

I was surprised how hard it is to implement an raw memory fifo.
That is you push x number of bytes into the queue/fifo and then pop y out.
The byte queue has few requirements:
Paged. The queue storage is split into fixed sized blocks.
Dynamic capacity.
Iterable from oldest byte added to newest byte added. (forward iterators)
Empty pages are either freed or recycled as new for push()
Queue size can be computed.
My current existing code has bugs that I don't know how to fix. (I tried and failed)
std::deque<> is the closest STL container but it doesn't provide access to the allocated memory as blocks/pages. It works poorly because the byte queue is used in ring-buffer fashion.
Most useful feature was the queue iteration: iterate(from, end, [](char*area,size_t n){}) that iterates the queue contents in as large chunks as possible.
Edit: For usage clarification: the memory pages may be registered for I/O operations. This means I can't easily deallocate said pages. E.g. std::deque<char> has unspecified page size. I don't know if it possible iterate it such that I know what ranges are infact contiguous in memory for memcpy() type of operations etc.
How do I implement such container or make better use of existing one?

Which C++ stl container should I use?

Imagine the following requirements:
measurement data should be logged and the user should be able to iterate through the data.
uint32_t timestamp;
uint16_t place;
struct SomeData someData;
have a timestamp (uint32_t), a place (uint16_t) and some data in a struct
have a constant number of datasets. If a new one arrives, the oldest is thrown away.
the number of "place" is dynamic, the user can insert new ones during runtime
it should be possible to iterate through the data to the next newer or older dataset but only if the place is the same
need to insert at the end only
memory should be allocated once at program start
insertion need not to be fast but should not block other threads for a long time which might be iterating through the container
memory requirement should be low
EDIT: - The container should all the memory which is not used otherwise, therefore it can be large.
I am not sure which container I should use. It is an embedded system and should not use boost etc.
I see the following possibilities:
std::vector - drawbacks: The insertion at the end requires that all objects are copied and during this time another thread cannot access the vector. Edit: This can be avoided by implementing it as a circular buffer - see comments below. When iterating throught the vector, I have to test the place ID. Maybe it might also be a problem to allocate much memory as one block - because the memory could be segmented?
std::deque - compared to std::vector insertion (and pop_back) is faster but memory requirement? Iterators do not become invalid if the insertion is at the end. But I still have to iterate and test the second ID ("place"). I think it does not need to allocate all the memory in one big block as it is the case with vector or array. If an element is added in front and another one is removed at the end (or removed first and added after), I guess there does no memory allocation take place?
std::queue - instead of deque, I should rather use a queue? Is it true that in many implementations a queue ist implented just as a deque?
std::map - Like deque any iterators to existing elements will not become invalid. If I make the key a combination of place and timestamp, then iteration through the map is maybe faster because it is already sorted? Memory requirements of a map?
std::multimap - as the number of places is not constant I cannot make a multimap with "place" as the index.
std::list - has no advantage over deque here?
Some suggested the use of a circular buffer. If I do not want that the memory is allocated as one big block I still have to use a container and most questions above stay valid.
Update:
I will use a ring buffer as suggested here but using a deque as the underlying container. In order to being able to scroll fast through the datasets with the preselected "place" I will eventually introduce two additional indices into the data struct which will point to the previous and the next index with the same place.
How much memory will be used? In my special case the size of the struct is 56 bytes. The gnu lib uses 512 bytes as minimum block size, the IAR compiler 16 bytes. Hence the used block size will be 512 or 56 bytes respectively. Besides two iterators (using 4 pointers each) and the size there will be a pointer stored for each block. Therefore in the implementation of the iar compiler (block size 56 bytes) there will be 7 % overhead (on a 32 bit system) compared to the use of a std::vector or array. In the gcc implementation there will fit 9 objects in the block (504 bytes) while 512 + 4 bytes are needed per block which is 2 % more.
The block size is not large but the continuous memory size needed for the pointer array is already relatively large, especially for the implementation where one block is one struct.
A std::list would need 2 pointers per struct which is 14 % overhead in my case on 32 bit systems.
std::vector
... the memory could be segmented?
No, std::vector allocates contiguous memory, as is documented in that link. Arrays are also contiguous, but you might just as well use vector for this.
std::deque is segmented, which you said you didn't want. Or do you want to avoid a single large allocated block? It's not clear.
Anyway, it has no benefit over vector if you really want a circular buffer (because you'll never be adding/removing elements from the front/back anyway), and you can't control the block size.
std::queue
... Is it true that in many implementations a queue is implented just as a deque?
Yes, that's the default in all implementations. See the linked documentation or any decent book.
It doesn't sound like you want a FIFO queue, so I don't know why you're considering this one - the interface doesn't match your stated requirement.
'std::map`
... iteration through the map is maybe faster because it is already sorted?
On most modern server/desktop architectures, map will be slower because advancing an iterator involves a pointer chase (which impairs pipelining) and a likely cache miss. Your anonymous embedded architecture may be less sensitive to these effects, so map may be faster for you.
... Memory requirements of a map?
Higher. You have the node size (at least a couple of pointers) added to each element.

Data Structure for steady stream of timestamped data

I have a steady stream of timestamped data, of which I want to always keep the last 5 seconds of data in a buffer.
Furthermore, I would like to provide support for extracting data of a given subinterval of the 5 seconds, so something like
interval = buffer.extractData(startTime, endTime);
What std data structure would be most appropriate for this?
1) The fact that a new sample pushes an old sample out hints that a Queue would be a good data structure
2) The fact that we have to have random access to any elements, in order to obtain the sub interval maybe suggests that vector is appropriate.
Furthermore, what would be a good way to present the subinterval to the user?
My suggestion would be using two iterators?
Unless you are in a fairly performance critical part of the code, a deque would seem reasonable. It can grow and shrink to accommodate changes in your data rate and has reasonable performance for double-ended queue operations and random access.
If the code is performance sensitive (or, even worse, has real-time requirements on top, as is the case with many timestamped buffers), you need to prevent memory allocations as much as possible. You would do this by using a ring buffer with a preallocated array (be it through unique_ptr<T[]> or vector) and either dropping elements when the buffer size is exceeded, or (taking one for the team and) increasing its size.
By never reducing size again, your ring buffer might waste some memory, but remember that in most cases memory is fairly plentiful.
Representing intervals by two iterators or a range object is both common, and although the C++ standard library often prefers iterators, my personal preference is for range objects due to their (in my opinion) slightly better usability.

Why is deque using so much more RAM than vector in C++?

I have a problem I am working on where I need to use some sort of 2 dimensional array. The array is fixed width (four columns), but I need to create extra rows on the fly.
To do this, I have been using vectors of vectors, and I have been using some nested loops that contain this:
array.push_back(vector<float>(4));
array[n][0] = a;
array[n][1] = b;
array[n][2] = c;
array[n][3] = d;
n++
to add the rows and their contents. The trouble is that I appear to be running out of memory with the number of elements I was trying to create, so I reduced the number that I was using. But then I started reading about deque, and thought it would allow me to use more memory because it doesn't have to be contiguous. I changed all mentions of "vector" to "deque", in this loop, as well as all declarations. But then it appeared that I ran out of memory again, this time with even with the reduced number of rows.
I looked at how much memory my code is using, and when I am using deque, the memory rises steadily to above 2GB, and the program closes soon after, even when using the smaller number of rows. I'm not sure exactly where in this loop it is when it runs out of memory.
When I use vectors, the memory usage (for the same number of rows) is still under 1GB, even when the loop exits. It then goes on to a similar loop where more rows are added, still only reaching about 1.4GB.
So my question is. Is this normal for deque to use more than twice the memory of vector, or am I making an erroneous assumption in thinking I can just replace the word "vector" with "deque" in the declarations/initializations and the above code?
Thanks in advance.
I'm using:
MS Visual C++ 2010 (32-bit)
Windows 7 (64-bit)
The real answer here has little to do with the core data structure. The answer is that MSVC's implementation of std::deque is especially awful and degenerates to an array of pointers to individual elements, rather than the array of arrays it should be. Frankly, only twice the memory use of vector is surprising. If you had a better implementation of deque you'd get better results.
It all depends on the internal implementation of deque (I won't speak about vector since it is relatively straightforward).
Fact is, deque has completely different guarantees than vector (the most important one being that it supports O(1) insertion at both ends while vector only supports O(1) insertion at the back). This in turn means the internal structures managed by deque have to be more complex than vector.
To allow that, a typical deque implementation will split its memory in several non-contiguous blocks. But each individual memory block has a fixed overhead to allow the memory management to work (eg. whatever the size of the block, the system may need another 16 or 32 bytes or whatever in addition, just for bookkeeping). Since, contrary to a vector, a deque requires many small, independent blocks, the overhead stacks up which can explain the difference you see. Also note that those individual memory blocks need to be managed (maybe in separate structures?), which probably means some (or a lot of) additional overhead too.
As for a way to solve your problem, you could try what #BasileStarynkevitch suggested in the comments, this will indeed reduce your memory usage but it will get you only so far because at some point you'll still run out of memory. And what if you try to run your program on a machine that only has 256MB RAM? Any other solution which goal is to reduce your memory footprint while still trying to keep all your data in memory will suffer from the same problems.
A proper solution when handling large datasets like yours would be to adapt your algorithms and data structures in order to be able to handle small partitions at a time of your whole dataset, and load/save those partitions as needed in order to make room for the other partitions. Unfortunately since it probably means disk access, it also means a big drop in performance but hey, you can't eat the cake and have it too.
Theory
There two common ways to efficiently implement a deque: either with a modified dynamic array or with a doubly linked list.
The modified dynamic array uses is basically a dynamic array that can grow from both ends, sometimes called array deques. These array deques have all the properties of a dynamic array, such as constant-time random access, good locality of reference, and inefficient insertion/removal in the middle, with the addition of amortized constant-time insertion/removal at both ends, instead of just one end.
There are several implementations of modified dynamic array:
Allocating deque contents from the center of the underlying array,
and resizing the underlying array when either end is reached. This
approach may require more frequent resizings and waste more space,
particularly when elements are only inserted at one end.
Storing deque contents in a circular buffer, and only resizing when
the buffer becomes full. This decreases the frequency of resizings.
Storing contents in multiple smaller arrays, allocating additional
arrays at the beginning or end as needed. Indexing is implemented by
keeping a dynamic array containing pointers to each of the smaller
arrays.
Conclusion
Different libraries may implement deques in different ways, but generally as a modified dynamic array. Most likely your standard library uses the approach #1 to implement std::deque, and since you append elements only from one end, you ultimately waste a lot of space. For that reason, it makes an illusion that std::deque takes up more space than usual std::vector.
Furthermore, if std::deque would be implemented as doubly-linked list, that would result in a waste of space too since each element would need to accommodate 2 pointers in addition to your custom data.
Implementation with approach #3 (modified dynamic array approach too) would again result in a waste of space to accommodate additional metadata such as pointers to all those small arrays.
In any case, std::deque is less efficient in terms of storage than plain old std::vector. Without knowing what do you want to achieve I cannot confidently suggest which data structure do you need. However, it seems like you don't even know what deques are for, therefore, what you really want in your situation is std::vector. Deques, in general, have different application.
Deque can have additional memory overhead over vector because it's made of a few blocks instead of contiguous one.
From en.cppreference.com/w/cpp/container/deque:
As opposed to std::vector, the elements of a deque are not stored contiguously: typical implementations use a sequence of individually allocated fixed-size arrays.
The primary issue is running out of memory.
So, do you need all the data in memory at once?
You may never be able to accomplish this.
Partial Processing
You may want to consider processing the data into "chunks" or smaller sub-matrices. For example, using the standard rectangular grid:
Read data of first quadrant.
Process data of first quandrant.
Store results (in a file) of first quandrant.
Repeat for remaining quandrants.
Searching
If you are searching for a particle or a set of datum, you can do that without reading in the entire data set into memory.
Allocate a block (array) of memory.
Read a portion of the data into this block of memory.
Search the block of data.
Repeat steps 2 and 3 until the data is found.
Streaming Data
If your application is receiving the raw data from an input source (other than a file), you will want to store the data for later processing.
This will require more than one buffer and is more efficient using at least two threads of execution.
The Reading Thread will be reading data into a buffer until the buffer is full. When the buffer is full, it will read data into another empty one.
The Writing Thread will initially wait until either the first read buffer is full or the read operation is finished. Next, the Writing Thread takes data out of the read buffer and writes to a file. The Write Thread then starts writing from the next read buffer.
This technique is called Double Buffering or Multiple Buffering.
Sparse Data
If there is a lot of zero or unused data in the matrix, you should try using Sparse Matrices. Essentially, this is a list of structures that hold the data's coordinates and the value. This also works when most of the data is a common value other than zero. This saves a lot of memory space; but costs a little bit more execution time.
Data Compression
You could also change your algorithms to use data compression. The idea here is to store the data location, value and the number or contiguous equal values (a.k.a. runs). So instead of storing 100 consecutive data points of the same value, you would store the starting position (of the run), the value, and 100 as the quantity. This saves a lot of space, but requires more processing time when accessing the data.
Memory Mapped File
There are libraries that can treat a file as memory. Essentially, they read in a "page" of the file into memory. When the requests go out of the "page", they read in another page. All this is performed "behind the scenes". All you need to do is treat the file like memory.
Summary
Arrays and deques are not your primary issue, quantity of data is. Your primary issue can be resolved by processing small pieces of data at a time, compressing the data storage, or treating the data in the file as memory. If you are trying to process streaming data, don't. Ideally, streaming data should be placed into a file and then processed later.
A historical purpose of a file is to contain data that doesn't fit into memory.

Which STL container should I use for a FIFO?

Which STL container would fit my needs best? I basically have a 10 elements wide container in which I continually push_back new elements while pop_front ing the oldest element (about a million time).
I am currently using a std::deque for the task but was wondering if a std::list would be more efficient since I wouldn't need to reallocate itself (or maybe I'm mistaking a std::deque for a std::vector?). Or is there an even more efficient container for my need?
P.S. I don't need random access
Since there are a myriad of answers, you might be confused, but to summarize:
Use a std::queue. The reason for this is simple: it is a FIFO structure. You want FIFO, you use a std::queue.
It makes your intent clear to anybody else, and even yourself. A std::list or std::deque does not. A list can insert and remove anywhere, which is not what a FIFO structure is suppose to do, and a deque can add and remove from either end, which is also something a FIFO structure cannot do.
This is why you should use a queue.
Now, you asked about performance. Firstly, always remember this important rule of thumb: Good code first, performance last.
The reason for this is simple: people who strive for performance before cleanliness and elegance almost always finish last. Their code becomes a slop of mush, because they've abandoned all that is good in order to really get nothing out of it.
By writing good, readable code first, most of you performance problems will solve themselves. And if later you find your performance is lacking, it's now easy to add a profiler to your nice, clean code, and find out where the problem is.
That all said, std::queue is only an adapter. It provides the safe interface, but uses a different container on the inside. You can choose this underlying container, and this allows a good deal of flexibility.
So, which underlying container should you use? We know that std::list and std::deque both provide the necessary functions (push_back(), pop_front(), and front()), so how do we decide?
First, understand that allocating (and deallocating) memory is not a quick thing to do, generally, because it involves going out to the OS and asking it to do something. A list has to allocate memory every single time something is added, and deallocate it when it goes away.
A deque, on the other hand, allocates in chunks. It will allocate less often than a list. Think of it as a list, but each memory chunk can hold multiple nodes. (Of course, I'd suggest that you really learn how it works.)
So, with that alone a deque should perform better, because it doesn't deal with memory as often. Mixed with the fact you're handling data of constant size, it probably won't have to allocate after the first pass through the data, whereas a list will be constantly allocating and deallocating.
A second thing to understand is cache performance. Going out to RAM is slow, so when the CPU really needs to, it makes the best out of this time by taking a chunk of memory back with it, into cache. Because a deque allocates in memory chunks, it's likely that accessing an element in this container will cause the CPU to bring back the rest of the container as well. Now any further accesses to the deque will be speedy, because the data is in cache.
This is unlike a list, where the data is allocated one at a time. This means that data could be spread out all over the place in memory, and cache performance will be bad.
So, considering that, a deque should be a better choice. This is why it is the default container when using a queue. That all said, this is still only a (very) educated guess: you'll have to profile this code, using a deque in one test and list in the other to really know for certain.
But remember: get the code working with a clean interface, then worry about performance.
John raises the concern that wrapping a list or deque will cause a performance decrease. Once again, he nor I can say for certain without profiling it ourselves, but chances are that the compiler will inline the calls that the queue makes. That is, when you say queue.push(), it will really just say queue.container.push_back(), skipping the function call completely.
Once again, this is only an educated guess, but using a queue will not degrade performance, when compared to using the underlying container raw. Like I've said before, use the queue, because it's clean, easy to use, and safe, and if it really becomes a problem profile and test.
Check out std::queue. It wraps an underlying container type, and the default container is std::deque.
Where performance really matters, check out the Boost circular buffer library.
I continually push_back new elements
while pop_front ing the oldest element
(about a million time).
A million is really not a big number in computing. As others have suggested, use a std::queue as your first solution. In the unlikely event of it being too slow, identify the bottleneck using a profiler (do not guess!) and re-implement using a different container with the same interface.
Why not std::queue? All it has is push_back and pop_front.
A queue is probably a simpler interface than a deque but for such a small list, the difference in performance is probably negligible.
Same goes for list. It's just down to a choice of what API you want.
Use a std::queue, but be cognizant of the performance tradeoffs of the two standard Container classes.
By default, std::queue is an adapter on top of std::deque. Typically, that'll give good performance where you have a small number of queues containing a large number of entries, which is arguably the common case.
However, don't be blind to the implementation of std::deque. Specifically:
"...deques typically have large minimal memory cost; a deque holding just one element has to allocate its full internal array (e.g. 8 times the object size on 64-bit libstdc++; 16 times the object size or 4096 bytes, whichever is larger, on 64-bit libc++)."
To net that out, presuming that a queue entry is something that you'd want to queue, i.e., reasonably small in size, then if you have 4 queues, each containing 30,000 entries, the std::deque implementation will be the option of choice. Conversely, if you have 30,000 queues, each containing 4 entries, then more than likely the std::list implementation will be optimal, as you'll never amortize the std::deque overhead in that scenario.
You'll read a lot of opinions about how cache is king, how Stroustrup hates linked lists, etc., and all of that is true, under certain conditions. Just don't accept it on blind faith, because in our second scenario there, it's quite unlikely that the default std::deque implementation will perform. Evaluate your usage and measure.
This case is simple enough that you can just write your own. Here is something that works well for micro-conroller situations where STL use takes too much space. It is nice way to pass data and signal from interrupt handler to your main loop.
// FIFO with circular buffer
#define fifo_size 4
class Fifo {
uint8_t buff[fifo_size];
int writePtr = 0;
int readPtr = 0;
public:
void put(uint8_t val) {
buff[writePtr%fifo_size] = val;
writePtr++;
}
uint8_t get() {
uint8_t val = NULL;
if(readPtr < writePtr) {
val = buff[readPtr%fifo_size];
readPtr++;
// reset pointers to avoid overflow
if(readPtr > fifo_size) {
writePtr = writePtr%fifo_size;
readPtr = readPtr%fifo_size;
}
}
return val;
}
int count() { return (writePtr - readPtr);}
};