boost::circular_buffer cann provide a fixed length buffer, e.g., at size of 5.
Imaging that I have realtime data stream coming in with timestamp. I want to keep a buffer of all the elemens within the last 5 minutes.
Naively, i can build a wrapper of std::list, everytime a new data point D coming in, I push_back(D), and then do a while loop to pop_front() all the data point older than 5 minutes.
The problem with such a design is that, I have to construct a new instance for every point, this seems to be a waste of time (this is a very heavily used object)
does anyone here have a more elegant solution?
thanks!
A list or a deque are both suitable for ring buffers. If your objects are trivially copyable and small you can just used the deque and probably not worry about the memory instances. If you have larger data you can use the list and a custom object pool (so that old unusued objects will be reused for future additions).
If you don't like the std collection object pool semantics (crappy prior to C++11, I'm not sure now), then you can simply store pointers in the deque and manage your own memory.
Related
A little bit of background first (skip ahead to the boldface if you're bored by this).
I'm trying to glue two pieces of code together. One is a JSON/YML library that makes heavy use of a custom string view object, the other is a piece of code from the early 2000s.
I've been seeing weird behavior for a long time, until I have traced it down to a memory issue, namely that the string views I construct in the JSON/YML library take a const char* as a constructor, and assume that the memory location of that char array stays constant over the lifetime of the string view. However, some of the std::string objects on which I construct these views are temporary, so that's just not true and the string view ends up pointing at garbage.
Now, I thought I was being smart and constructed a cache in the form of a std::vector that would hold all the temporary strings, I would construct the string views on these and only clear the cache in the end - easy.
However, I was still seeing garbled strings every now and then, until I found the reason: sometimes, when pushing things to the vector beyond the preallocated size, the vector would be moved to a different memory location, invalidating all the string views. For now, I've settled on preallocating a cache size that is large enough to avoid any conceivable moving of the vector, but I can see this causing severe and untracable problems in the future for very large runs. So here's my question:
How can I construct a std::vector<std::string> or any other string container that either avoids being moved in memory alltogether, or at least throws an error message if that happens?
Of course, if you feel that I'm going about this whole issue in the wrong way fundamentally, please also let me know how I should deal with this issue instead.
If you're interested, the two pieces of code in question are RapidYAML and the CERN Statistics Library ROOT.
My answer from a similar question: Any way to update pointer/reference value when vector changes capability?
If you store objects in your vector as std::unique_ptr or std::shared_ptr, you can get an observing pointer to the underlying object with std::unique_ptr::get() (or a reference if you dereference the smart pointer). This way, even though the memory location of the smart pointer changes upon resizing, the observing pointer points to the same object and thus the same memory location.
[...] sometimes, when pushing things to the vector beyond the preallocated size, the vector would be moved to a different memory location, invalidating all the string views.
The reason is that std::vector is required to store its data contiguously in memory. So, if you exceed the maximum capacity of the vector when adding an element, it will allocate a new space in memory (big enough this time) and move all the data here.
What you are subject to is called iterator invalidation.
How can I construct a std::vector or any other string container that either avoids being moved in memory alltogether, or at least throws an error message if that happens?
You have at least 3 easy solutions:
If your cache size is supposed to be fixed and is known at compile-time, I would advise you to use std::array instead.
If your cache size is supposed to be fixed but not necessarily known at compile-time, I would advise you to reserve() the required capacity of your std::vector so that you will have the guarantee that it will big enough to not need to be reallocated.
If your cache size may change, I would advise you to use std::list instead. It is implemented as a (usually doubly) linked list. It will guarantee that the elements will not be relocated in memory.
But since they are not stored contiguously in memory, you'll lose the ability to have direct access to any element (i.e. you'll need to iterate over the list in order to find an element).
Of course there probably are other solutions (I do not claim this answer to be exhaustive) but these solutions will allow you to almost not change your code (only the container) and protect your string views to be invalidated.
Perhaps use an std::list. Its accessing method is slower (at least when iterating) but memory location is constant. Reason for both is that it does not use contiguous memory.
Alternatively create a wrapper that wraps a pointer to a string that has been created with "new". That address will also be constant. EDIT: Somehow I managed to miss that what I've just described is pretty much a smartpointer minus automated deletion ;)
Well sadly it is impossible to be able to grow a vector while being sure the content will stay at the same place on classical OS at least.
There is the function realloc that tries to keep the same place, but as you can read on the documentation, there is no guarantee to that, only the os will decide.
To solution your problem, you need the concept of a pool, a pool of string here, that handle the life time of your strings.
You may get away with a simple std::list of string, but it will lead to bad data aliasing and a lot of independent allocations bad to your performances. These will also be the problems with smart pointers.
So if you care about performances, how you may implement it in your case may be not far from your current implementation in my opinion. Because you cannot resize the vector, you should prefer an std::array of a fixed size that you decide at compile time. Then, whenever you need it, you can create a new one to expand your memory capacity. This may be easily implemented by a std::list<std::array> typically.
I don't know if it applies here, but you must be careful if your application can create any number of string during its execution as it may induce an ever growing memory pool, and maybe finally memory problems. To fix that you may insure that the strings you don't use anymore can be reused or freed. Sadly I cannot help you too much here, as these rules will depend on your application.
When storing objects in standard collections, what considerations should one think of when deciding between storing values vs pointers? Where the owner of these collections (ObjectOwner) is allocated on the heap. For small structs/objects I've been storing values, while for large objects I've been storing pointers. My reasoning for this was that when standard containers are resized, their contents are copied (small copy ok, big copy bad). Any other things to keep in mind here?
class ObjectOwner
{
public:
SmallObject& getSmallObject(int smallObjectId);
HugeObject* getHugeObject(int hugeObjectId);
private:
std::map<int, SmallObject> mSmallObjectMap;
std::map<int, HugeObject *> mHugeObjectMap;
};
Edit:
an example of the above for more context:
Create/Delete items stored in std::map relatively infrequently ( a few times per second)
Get from std::map frequently (once per 10 milliseconds)
small object: < 32 bytes
huge object: > 1024 bytes
I would store object by value unless I need it through pointer. Possible reasons:
I need to store object hierarchy in a container (to avoid slicing)
I need shared ownership
There are possibly other reasons, but reasoning by size is only valid for some containers (std::vector for example) and even there you can make object moving cost minimal (reserve enough room in advance for example). You example for object size with std::map does not make any sense as std::map does not relocate objects when growing.
Note: return type of a method should not reflect the way you store it in a container, but rather it should be based on method semantics, ie what you would do if object is not found.
Only your profiler knows the answer for your SPECIFIC case; trying to use pointers rather than objects is a reliable way of minimising the amount you copy when a copy must be done (be it a resize of a vector or a copy of the whole container); but sometimes you WANT that copy because it's a snapshot for a thread inside a mutex, and it's faster to copy the container than to hold the mutex and deal with the data.
Some objects might not be possible to keep in any way other than pointer because they're not copyable.
Any performance gain by using container of pointer could be offset by costs of having to write more copy code, or repeated calls to new().
There's not a one answer fits all, and before you worry about the performance here you should establish where the performance problems really are. (Just repeating the point - use a profiler!)
I have a steady stream of timestamped data, of which I want to always keep the last 5 seconds of data in a buffer.
Furthermore, I would like to provide support for extracting data of a given subinterval of the 5 seconds, so something like
interval = buffer.extractData(startTime, endTime);
What std data structure would be most appropriate for this?
1) The fact that a new sample pushes an old sample out hints that a Queue would be a good data structure
2) The fact that we have to have random access to any elements, in order to obtain the sub interval maybe suggests that vector is appropriate.
Furthermore, what would be a good way to present the subinterval to the user?
My suggestion would be using two iterators?
Unless you are in a fairly performance critical part of the code, a deque would seem reasonable. It can grow and shrink to accommodate changes in your data rate and has reasonable performance for double-ended queue operations and random access.
If the code is performance sensitive (or, even worse, has real-time requirements on top, as is the case with many timestamped buffers), you need to prevent memory allocations as much as possible. You would do this by using a ring buffer with a preallocated array (be it through unique_ptr<T[]> or vector) and either dropping elements when the buffer size is exceeded, or (taking one for the team and) increasing its size.
By never reducing size again, your ring buffer might waste some memory, but remember that in most cases memory is fairly plentiful.
Representing intervals by two iterators or a range object is both common, and although the C++ standard library often prefers iterators, my personal preference is for range objects due to their (in my opinion) slightly better usability.
I have a problem I am working on where I need to use some sort of 2 dimensional array. The array is fixed width (four columns), but I need to create extra rows on the fly.
To do this, I have been using vectors of vectors, and I have been using some nested loops that contain this:
array.push_back(vector<float>(4));
array[n][0] = a;
array[n][1] = b;
array[n][2] = c;
array[n][3] = d;
n++
to add the rows and their contents. The trouble is that I appear to be running out of memory with the number of elements I was trying to create, so I reduced the number that I was using. But then I started reading about deque, and thought it would allow me to use more memory because it doesn't have to be contiguous. I changed all mentions of "vector" to "deque", in this loop, as well as all declarations. But then it appeared that I ran out of memory again, this time with even with the reduced number of rows.
I looked at how much memory my code is using, and when I am using deque, the memory rises steadily to above 2GB, and the program closes soon after, even when using the smaller number of rows. I'm not sure exactly where in this loop it is when it runs out of memory.
When I use vectors, the memory usage (for the same number of rows) is still under 1GB, even when the loop exits. It then goes on to a similar loop where more rows are added, still only reaching about 1.4GB.
So my question is. Is this normal for deque to use more than twice the memory of vector, or am I making an erroneous assumption in thinking I can just replace the word "vector" with "deque" in the declarations/initializations and the above code?
Thanks in advance.
I'm using:
MS Visual C++ 2010 (32-bit)
Windows 7 (64-bit)
The real answer here has little to do with the core data structure. The answer is that MSVC's implementation of std::deque is especially awful and degenerates to an array of pointers to individual elements, rather than the array of arrays it should be. Frankly, only twice the memory use of vector is surprising. If you had a better implementation of deque you'd get better results.
It all depends on the internal implementation of deque (I won't speak about vector since it is relatively straightforward).
Fact is, deque has completely different guarantees than vector (the most important one being that it supports O(1) insertion at both ends while vector only supports O(1) insertion at the back). This in turn means the internal structures managed by deque have to be more complex than vector.
To allow that, a typical deque implementation will split its memory in several non-contiguous blocks. But each individual memory block has a fixed overhead to allow the memory management to work (eg. whatever the size of the block, the system may need another 16 or 32 bytes or whatever in addition, just for bookkeeping). Since, contrary to a vector, a deque requires many small, independent blocks, the overhead stacks up which can explain the difference you see. Also note that those individual memory blocks need to be managed (maybe in separate structures?), which probably means some (or a lot of) additional overhead too.
As for a way to solve your problem, you could try what #BasileStarynkevitch suggested in the comments, this will indeed reduce your memory usage but it will get you only so far because at some point you'll still run out of memory. And what if you try to run your program on a machine that only has 256MB RAM? Any other solution which goal is to reduce your memory footprint while still trying to keep all your data in memory will suffer from the same problems.
A proper solution when handling large datasets like yours would be to adapt your algorithms and data structures in order to be able to handle small partitions at a time of your whole dataset, and load/save those partitions as needed in order to make room for the other partitions. Unfortunately since it probably means disk access, it also means a big drop in performance but hey, you can't eat the cake and have it too.
Theory
There two common ways to efficiently implement a deque: either with a modified dynamic array or with a doubly linked list.
The modified dynamic array uses is basically a dynamic array that can grow from both ends, sometimes called array deques. These array deques have all the properties of a dynamic array, such as constant-time random access, good locality of reference, and inefficient insertion/removal in the middle, with the addition of amortized constant-time insertion/removal at both ends, instead of just one end.
There are several implementations of modified dynamic array:
Allocating deque contents from the center of the underlying array,
and resizing the underlying array when either end is reached. This
approach may require more frequent resizings and waste more space,
particularly when elements are only inserted at one end.
Storing deque contents in a circular buffer, and only resizing when
the buffer becomes full. This decreases the frequency of resizings.
Storing contents in multiple smaller arrays, allocating additional
arrays at the beginning or end as needed. Indexing is implemented by
keeping a dynamic array containing pointers to each of the smaller
arrays.
Conclusion
Different libraries may implement deques in different ways, but generally as a modified dynamic array. Most likely your standard library uses the approach #1 to implement std::deque, and since you append elements only from one end, you ultimately waste a lot of space. For that reason, it makes an illusion that std::deque takes up more space than usual std::vector.
Furthermore, if std::deque would be implemented as doubly-linked list, that would result in a waste of space too since each element would need to accommodate 2 pointers in addition to your custom data.
Implementation with approach #3 (modified dynamic array approach too) would again result in a waste of space to accommodate additional metadata such as pointers to all those small arrays.
In any case, std::deque is less efficient in terms of storage than plain old std::vector. Without knowing what do you want to achieve I cannot confidently suggest which data structure do you need. However, it seems like you don't even know what deques are for, therefore, what you really want in your situation is std::vector. Deques, in general, have different application.
Deque can have additional memory overhead over vector because it's made of a few blocks instead of contiguous one.
From en.cppreference.com/w/cpp/container/deque:
As opposed to std::vector, the elements of a deque are not stored contiguously: typical implementations use a sequence of individually allocated fixed-size arrays.
The primary issue is running out of memory.
So, do you need all the data in memory at once?
You may never be able to accomplish this.
Partial Processing
You may want to consider processing the data into "chunks" or smaller sub-matrices. For example, using the standard rectangular grid:
Read data of first quadrant.
Process data of first quandrant.
Store results (in a file) of first quandrant.
Repeat for remaining quandrants.
Searching
If you are searching for a particle or a set of datum, you can do that without reading in the entire data set into memory.
Allocate a block (array) of memory.
Read a portion of the data into this block of memory.
Search the block of data.
Repeat steps 2 and 3 until the data is found.
Streaming Data
If your application is receiving the raw data from an input source (other than a file), you will want to store the data for later processing.
This will require more than one buffer and is more efficient using at least two threads of execution.
The Reading Thread will be reading data into a buffer until the buffer is full. When the buffer is full, it will read data into another empty one.
The Writing Thread will initially wait until either the first read buffer is full or the read operation is finished. Next, the Writing Thread takes data out of the read buffer and writes to a file. The Write Thread then starts writing from the next read buffer.
This technique is called Double Buffering or Multiple Buffering.
Sparse Data
If there is a lot of zero or unused data in the matrix, you should try using Sparse Matrices. Essentially, this is a list of structures that hold the data's coordinates and the value. This also works when most of the data is a common value other than zero. This saves a lot of memory space; but costs a little bit more execution time.
Data Compression
You could also change your algorithms to use data compression. The idea here is to store the data location, value and the number or contiguous equal values (a.k.a. runs). So instead of storing 100 consecutive data points of the same value, you would store the starting position (of the run), the value, and 100 as the quantity. This saves a lot of space, but requires more processing time when accessing the data.
Memory Mapped File
There are libraries that can treat a file as memory. Essentially, they read in a "page" of the file into memory. When the requests go out of the "page", they read in another page. All this is performed "behind the scenes". All you need to do is treat the file like memory.
Summary
Arrays and deques are not your primary issue, quantity of data is. Your primary issue can be resolved by processing small pieces of data at a time, compressing the data storage, or treating the data in the file as memory. If you are trying to process streaming data, don't. Ideally, streaming data should be placed into a file and then processed later.
A historical purpose of a file is to contain data that doesn't fit into memory.
Is there any pattern how to deal with a lot of object instantiations (40k per second) on a mobile device? I need these objects separately and they cannot be combined. A reusage of objects would probably be a solution. Any hints?
Yes. Keep old objects in a pool and re-use them, if you can.
You will save massive amounts of time due to the cost of memory allocation and deletion.
I think you could consider these design patterns:
Object Pool
Factory
Further info
I hope this help you too: Object Pooling for Generic C++ classes
If the objects are all the same size, try a simple cell allocator with an intrusive linked list of free nodes:
free:
add node to head of list
allocate:
if list is non-empty:
remove the head of the list and return it
else:
allocate a large block of memory
split it into cells of the required size
add all but one of them to the free list
return the other one
If allocation and freeing are all done in a single thread, then you don't need any synchronisation. If they're done in different threads, then possibly 40k context switches per second is a bigger worry than 40k allocations per second ;-)
You can make the cells be just "raw memory" (and either use placement new or overload operator new for your class), or else keep the objects initialized at all times, even when they're on the "free list", and assign whatever values you need to the members of "new" ones. Which you do depends how expensive initialization is, and probably is the technical difference between a cell allocator and an object pool.
You might be able to use the flyweight pattern if your objects are redundant. This pattern shares memory amongst similar objects. The classical example is the data structure used for graphical representation of characters in a word processing program.
Wikipedia has a summary.
There is an implementation in boost.
Hard to say exactly how to improve your code without more information, but you probably want to check out the Boost Pool libraries. They all provide different ways of quickly allocating memory for different, specific use cases. Choose the one that fits your use case best.
If the objects are the same size, you can allocate a large chunk of memory and use placement new, that will help with the allocate cost as it will all be in contiguous memory:
Object *pool = malloc( sizeof(Object) * numberOfObjects );
for(int i=0; i<numberOfObjects; i++)
new (&pool[i]) Object()
I've used similar patterns for programming stochastic reaction-diffusion systems (millions of object creations per second on a desktop computer) and for real-time image processing (again, hundreds of thousands or millions per second).
The basic idea is as follows:
Create an allocator that allocates large arrays of your desired object; require that this object have a "next" pointer (I usually create a template that wraps the object with a next pointer).
Every time you need an object, get one from this allocator (using the new-syntax that initializes from the block of memory you call).
Every time you're done, give it back to the allocator and place it on a stack.
The allocator gives you something off the stack if the stack is nonempty, or something from its array buffer otherwise. If you run out of buffer, you can either allocate another larger buffer and copy the existing used nodes, or have the allocator maintain a stack of fully-used allocation blocks.
When you are done with all the objects, delete the allocator. Side benefit: you don't need to be sure to free each individual object; they'll all go away. Side cost: you'd better be sure to allocate anything you want to preserve forever on the heap instead of in this temporary buffer (or have a permanent buffer you use).
I generally get performance about 10x better than raw malloc/new when using this approach.