I've been scouring the net looking for a container that handles this scenario best:
Linear memory (no gaps like an object pool or allocator would have)
Some way to give a reference to an object in container that remains persistent between adds/removals. Or a way to search quickly to find original objects.
Decently fast adds to end and removals from middle (but no inserts required)
So far the only solution I've been able to find is to use an std::vector and when a removal takes place I update all reference indices above the current index being removed. This just seems bad, looking for any other solution that would be more efficient.

Here is a horrible idea. I haven't tried it at all so there is probably more than a few bugs.
template <typename T>
class InsaneContainter {
class MemberPointer {
friend class InsaneContainer;
size_t idx_;
InsaneContainter* parent_;
MemberPointer(InsaneContainter* parent,size_t idx) idx_(idx),parent_(parent){}
T& operator*() {
friend class MemberPointer;
using Handle = std::shared_ptr<MemberPointer>;
Handle insert(const T& t) {
members.push_back(std::make_tuple(T{t},Handle{new MemberPointer{this,members.size()}));
return std::get<1>(members.back());
Handle GetHandle(size_t idx) {
return std::get<1>(members[idx]);
void delete(size_t idx) {
//swap with end
std::vector<std::tuple<T,std::shared_ptr<MemberPointer>> members_;
The idea is that, at insertion time, you'll receive a handle that will always have O(1) find and delete. While it is otherwize O(n) to find the object, once you find it you can get the handle which will stay up to date.
The usage of such a structure to say the least so I suspect and X vs Y problem here.

Through lots of performance testing I found the fastest method for the general case below:
1.) Use a pool allocator that stores free memory regions.
2.) Use free memory region list to copy occupied data linearly into temporary memory at every "gap" in the pool.
This works best for me due to the nature of add/removes in my program (resulting in low fragmentation)


Converting "handle" to void*, how to create and store? When to delete?

I have created a poll data structure for a game engine, as explained in:-
In short, the structure stores values, instead of pointers.
Here is a draft.
template<class T> class Handle{
int id;
template<class T> class PackArray{
std::vector <int>indirection ; //promote indirection here
std::vector <T>data;
//... some fields for pooling (for recycling instance of T)
Handle<T> create(){
//.... update indirection ...
return Id( .... index , usually = indirection.size()-1 .... )
T* get(Handle<T> id){
return &data[indirection[]];
//the return result is not stable, caller can't hold it very long
//... others function e.g. destroy(Id<T>) ...
Most steps of the refactor to adopt this new data structure are simple, e.g.
Bullet* bullet= new Bullet(); //old version
Handle<Bullet> bullet= packBulletArray.create(); //new version
The problem start when it come to some interfaces the require pointer.
As an example, one of the interfaces is the physic engine.
If I want the engine's collision callback, physic engines likes Box2D and Bullet Physics requires me to pass void*.
Bullet* bullet= .... ;
physicBody->setUserData(bullet); <-- It requires void*.
Question: How should I change the second line to a valid code?
Handle<Bullet> bullet = .... ;
physicBody->setUserData( ???? );
(1) There is no guarantee that this instance of "Handle" will exist in the future.
physicBody->setUserData( &bullet ); //can't do this
//e.g. "Handle<Bullet> bullet" in the above code is a local variable, it will be deleted soon
(2) There is no guarantee that the underlying object of "bullet" will exist in the same address in the future.
physicBody->setUserData( bullet->get() ); //can't do this
//because "std::vector<T> data" may reallocate in the future
The answer can assume that:
(1) "physicBody" is encapsulated by me already. It can cache generic pointers if requires, but caching a value of Handle is not allowed, because it creates a severe coupling.
(2) "bullet" has a way to access the correct "physicBody" via its encapsulator. "physicBody" is always deleted before "bullet".
(3) "Handle" also cache "PackArray*", this cache is always a correct pointer.
I guess the solution is something about unique-pointer / make_pointer, but I don't have enough experience to use them for this problem.
P.S. I care about performance.
For reference, this is a sequel of a question create dense dynamic array (array of value) as library that has been solved.

How can I reduce allocation for lookup in C++ map/unordered_map containers?

Suppose I am using std::unordered_map<std::string, Foo> in my code. It's nice and convenient, but unfortunately every time I want to do a lookup (find()) in this map I have to come up with an instance of std::string.
For instance, let's say I'm tokenizing some other string and want to call find() on every token. This forces me to construct an std::string around every token before looking it up, which requires an allocator (std::allocator, which amounts to a CRT malloc()). This can easily be slower than the actual lookup itself. It also contends with other threads since heap management requires some form of synchronization.
A few years ago I found the Boost.intrusive library; it was just a beta version back then. The interesting thing was it had a container called boost::intrusive::iunordered_set which allowed code to perform lookups with any user-supplied type.
I'll explain it how I'd like it to work:
struct immutable_string
const char *pf, *pl;
struct equals
bool operator()(const string& left, immutable_string& right) const
if (left.length() != -
return false;
return std::equals(,, left.begin());
struct hasher
size_t operator()(const immutable_string& s) const
return boost::hash_range(,;
struct string_hasher
size_t operator()(const std::string& s) const
return boost::hash_range(s.begin(), s.end());
std::unordered_map<std::string, Foo, string_hasher> m;
m["abc"] = Foo(123);
immutable_string token; // token refers to a substring inside some other string
auto it = m.find(token, immutable_string::equals(), immutable_string::hasher());
Another thing would be to speed up the "find and insert if not found" use caseā€”the trick with lower_bound() only works for ordered containers. The intrusive container has methods called insert_check() and insert_commit(), but that's for a separate topic I guess.
Turns out boost::unordered_map (as of 1.42) has a find overload that takes CompatibleKey, CompatibleHash, CompatiblePredicate types, so it can do exactly what I asked for here.
When it comes to lexing, I personally use two simple tricks:
I use StringRef (similar to LLVM's) which just wraps a char const* and a size_t and provides string-like operations (only const operations, obviously)
I pool the encountered strings using a bump allocator (using lumps of say 4K)
The two combined is quite efficient, though one need understand that all StringRef that point into the pool are obviously invalidated as soon as the pool is destroyed.

Have an extra data member only when something is active in c++

I have an implementation of a queue, something like template <typename T> queue<T> with a struct QueueItem { T data;} and I have a separate library that times the passage of data across different places (including from one producer thread to consumer thread via this queue). In order to do this, I inserted code from that timing library into the push and pop functions of the queue so that when they assign a they also assign an extra member i added of type void* to some timing metadata from that library. I.e. what used to be something like:
void push(T t)
QueueItem i; = t;
//insert i into queue
void push(T t)
QueueItem i; = t;
void* fox = timinglib.getMetadata();
i.timingInfo = fox;
//insert i into queue
with QueueItem going from
struct QueueItem
T data;
struct QueueItem
T data;
void* timingInfo;
What I would like to achieve, however, is the ability to swap out of the latter struct in favor of the lighter weight struct whenever the timing library is not activated. Something like:
if timingLib.isInactive()
;//use the smaller struct QueueItem
;//use the larger struct QueueItem
as cheaply as possible. What would be a good way to do this?
You can't have a struct that is big and small at the same time, obviously, so you're going to have to look at some form of inheritance or pointer/reference, or a union.
A union would be ideal for you if there's "spare" data in T that could be occupied by your timingInfo. If not, then it's going to be as 'heavy' as the original.
Using inheritance is also likely to be as big as the original, as it'll add a vtable in there which will pad it out too much.
So, the next option is to store a pointer only, and have that point to the data you want to store, either the data or the data+timing. This kind of pattern is known as 'flyweight' - where common data is stored separately to the object that is manipulated. This might be what you're looking for (depending on what the timing info metadata is).
The other, more complex, alternative is to have 2 queues that you keep in sync. You store data in one, and the other one stores the associated timeing info, if enabled. If not enabled, you ignore the 2nd queue. The trouble with this is ensuring the 2 are kept in sync, but that's a organisational problem rather than a technical challenge. Maybe create a new Queue class that contains the 2 real queues internally.
I'll start by just confirming my assumption that this needs to be a runtime choice and you can't just build two different binaries with timing enabled/disabled. That approach eliminates as much overhead in any approach as possible.
So now let's assume we want different runtime behavior. There will need to be runtime decisions, so there are a couple options. If you can get away with the (relatively small) cost of polymorphism then you could make your queue polymorphic and create the appropriate instance once at startup and then its push for example either will or won't add the extra data.
However if that's not an option I believe you can use templates to help accomplish your end, although there will likely be some up-front work and it will probably increase the size of your binary with the extra code.
You start with a template to add timing to a class:
template <typename Timee>
struct Timed : public Timee
void* timingInfo;
Then a timed QueueItem would look like:
Timed<QueueItem> timed_item;
To anything that doesn't care about the timing, this class looks exactly like a QueueItem: It will automatically upcast or slice to the parent as appropriate. And if a method needs to know the timing information you either create an overload that knows what to do for a Timed<T> or do a runtime check (for the "is timing enabled" flag) and downcast to the correct type.
Next, you'll need to change your Queue instantiation to know whether it's using the base QueueItem or the Timed version. For example, a very very rough sketch of a possible mechanism:
template <typename Element>
void run()
Queue<Element> queue;
int main()
run<Timed<QueueItem> >();
return 0;
You would "likely" need a specialization for Queue when used with Timed items unless getting the metadata is stateless in which case the Timed constructor can gather the info and self-populate itself when created. Then Queue just stays the same and relies on which instantiation you're using.

Iterating a changing container

I am iterating over a set of callback functions. Functions are called during iteration and may lead to drastic changes to the actual container of the functions set.
What I am doing now is:
make a copy of original set
iterate over copy, but for every element check whether it still exists in the original set
Checking for every element's existence is super-dynamic, but seems quite slow too.
Are there other propositions to tackle this case?
Edit : here is the actual code :
// => i = event id
template <class Param>
void dispatchEvent(int i, Param param) {
EventReceiverSet processingNow;
const EventReceiverSet& eventReceiverSet = eventReceiverSets[i];
std::copy(eventReceiverSet.begin(), eventReceiverSet.end(), std::inserter(processingNow, processingNow.begin()));
while (!processingNow.empty()) {
EventReceiverSet::iterator it = processingNow.begin();
IFunction<>* function = it->getIFunction(); /// get function before removing iterator
// is EventReceiver still valid? (may have been removed from original set)
if (eventReceiverSet.find(ERWrapper(function)) == eventReceiverSet.end()) continue; // not found
Two basic approaches come to mind:
use a task based approach (with the collection locked, push tasks onto a queue for each element, then release all parties to do work and wait till completion). You'll still need a check to see whether the element for the current task is still present/current in the collection when the task is actually starting.
this could leverage reader-writer locks for the checks, which is usually speedier than fullblown mutual exclusions (especially with more readers than writers)
use a concurrent data structure (I mean, one that is suitable for multithreaded access without explicit locking). The following libraries contain implementations of concurrent data structures:
Intel Thread Building Blocks
MS ConCrt concurrent_vector
libcds Concurrent Data Structures
(adding links shortly)
There is a way to do it in two steps: first, go through the original set, and make a set of action items. Then go through the set of action items, and apply them to the original set.
An action item is a base class with subclasses. Each subclass takes in a set, and performs a specific operation on it, for example:
struct set_action {
virtual void act(std::set<int> mySet) const;
class del_action : public set_action {
int item;
del_action(int _item) : item(_item) {}
virtual void act(std::set<int> mySet) const {
// delete item from set
class upd_action : public set_action {
int from, to;
upd_action(int _from, int _to) : from(_from), to(_to) {}
virtual void act(std::set<int> mySet) const {
// delete [from], insert [to]
Now you can create a collection of set_action*s in the first pass, and run them in the second pass.
The operations which mutate the set structure are insert() and erase().
While iterating, consider using the iterator returned by the mutating operations.
it = myset.erase( it );

Reducing STL code bloat by wrapping containers

I have a C++ library (with over 50 source files) which uses a lot of STL routines with primary containers being list and vector. This has caused a huge code bloat and I would like to reduce the code bloat by creating a wrapper over the list and vector.
Shown below is my wrapper over std:: and the wrapped instances.
template<typename T>
class wlist
std::list<T> m_list;
// new iterator set.
typedef typename std::list<T>::iterator iterator;
typedef typename std::list<T>::const_iterator cIterator;
typedef typename std::list<T>::reverse_iterator reverse_iterator;
unsigned int size () { return m_list.size(); }
bool empty () { return m_list.empty(); }
void pop_back () { m_list.pop_back(); }
void pop_front () { m_list.pop_front(); }
void push_front (T& item) { m_list.push_front(item); }
void push_back (T item) { m_list.push_back(item); }
iterator insert(iterator position, T item) {m_list.insert(position,item);}
bool delete_item (T& item);
T back () { return (m_list.empty()) ? NULL : m_list.back();}
T front () { return (m_list.empty()) ? NULL : m_list.front();}
iterator erase(iterator item ) { return m_list.erase(item); }
iterator begin() { return m_list.begin(); }
iterator end() { return m_list.end(); }
reverse_iterator rbegin() { return m_list.rbegin(); }
File A:
class label {
int getPosition(void);
setPosition(int x);
wlist<text*> _elementText; // used in place of list<text> _elementText;
File B:
class image {
void draw image() {
wlist<label*>::iterator currentElement = _elementText.begin();
currentElement ++;
My belief was that by wrapping the STL container, I would be able to reduce the code bloat but the reduction in code size seems to be insignificant while my motive to wrap the STL was to achieve a code reduction of roughly 20%.
1) By exposing the "wrapped" iterator, have I in-turn embedded STL into my client code thereby negating all the code saving that I was trying to do ????
2) Have I chosen the right profiling method ????
Size before modification:
$ size
text: 813115
data: 99436
bss: 132704
dec : 1045255
hex: ff307
Size after modification:
$ size
text: 806607
data: 98780
bss: 132704
dec : 1038091
hex: fd70b
Firstly, the interface offered by your wrapper is completely and totally disgusting. There's a reason that iterators exist, and it's because your implementation flat out doesn't work for non-pointer types. Returning and taking by value instead of by reference? A terrible design.
Secondly, you can never reduce the size of your program by introducing more code. Your wrapper still uses the STL list under the hood, so you're still instantiating all of those types. Most likely, the compiler just completely removed the whole lot.
Thirdly, you're not even doing an equivalent replacement, because you've replaced what used to be a list of values wth a list of pointers, introducing six million lifetime headaches and other problems.
Fourthly, even the idea of code bloat is quite ridiculous on the vast majority of platforms. I, of course, cannot psychically know that you are not working on some embedded platform with hardly any memory (although I doubt you would use many lists on such a platform) but on virtually every system, the size of the code itself is meaningless compared to other assets needed for the program to execute.
What you can do is try something like SCARY iterators or partial specializations for T*.
I am trying to imagine why you are concerned with this. Why is it a problem?
My guess is that you have many ( hundreds? ) of different classes, and each one generates a copy of the templated containers.
If this is so and if it is necessary, then sit back and let the compiler do the tedious work for you.
If it is not necessary, then the problem seems likely to be that all your different classes are not necessary. There is a problem with your class design. You might have many different classes that differ only slightly. If the difference is so slight that the extra code generated to handle the difference seems out of proportion, then the different behavior might be better handled by code inside a single class.
It seems that you want to pre-compile your templated wrapper once only in your library rather than have the compiler figure out the templated class every time it gets called. You can do this by moving your declaration from the header file (where it normally is for templated code) into your .cpp file. This also has the advantage that it reduces compilation times. There is a price in flexability in this approach, however, you have know from the beginnings the types that you want your class to work for (but you don't want the compiler to figure it out for you, anyway)
Putting templated code into a .cpp file will usually result in linker errors. To avoid these you need to expliciltly declaire the templates that you want the compiler to compile in the cpp file:
At the end of the .cpp file, you write something like
template class wlist<double>;
template class wlist<int>;
This instructs the compiler to compiler to compile these version of the class (and only these versions).
This of cause reduces the flexibility of your library - if you call a wlist<complex> then you would get the linker errors.
See here for more info:
I believe this is usually done to reduce compilation times - I imagine it will reduce code bloat too, but I have never used the technique for this reason and so never checked the size of my executable....