Memory management interface - c++

I am writing a small particle system in C++ and am yet unsure about how I should manage the particle related data -- should it be stored in a static or dynamic array, in a linked list, some mixture of both, or whatever else one might think of?
At the moment I don't want to make a choice but would rather like to use an abstract class for memory mangement that on the one hand provides me with allocation and deallocation routines and on the other hand takes care of deallocation of the supplied resources in its destructor. I hope that in this way I can change between and test different particle management strategies quickly and transparently.
1) Is this a reasonable thing to do?
2) If yes: Are there any libraries that provide such functionality?
Thank you for you help!

For a particle system you may wish to consider using one std::vector for each coordinate, velocity, colour channel etc for each particle. Eg
std::vector<float> x(100);
std::vector<float> vx(100);
etc
Instead of
std::vector<Particle> p(100)
This is known as SOA (structure-of-array) rather than AOS (array of structures). The former is more amenable to vectorization.

The rule of thumb is to use std::vector unless you really have a reason to chose something else. At the moment you can stick with it. To control memory management at the low level you can supply a vector with your own allocator in case std::allocator which will use std::new_allocator should be replaced. If your main concern is extensive deleting and allocating single object than definitely you might consider writing your own user-defined allocator which will allocate from pool of fixed-sized elements organized into linked list, because conventional and more general oeprator new() is not efficient in case of many calls to allocate or deallocate objects one at a time.
To test different containers is a reasonable thing IMO, however vector should suffice. In order to decide if
1) Is this a reasonable thing to do?
and thus such tests should be covered at all - you have to think about the operations you are going to use extensively.
2) If yes: Are there any libraries that provide such functionality?
I don't know about such library.

Related

How to achieve cache coherency with an abstract class pointer vector in C++?

I'm making a little game in C++. I found answers on StackExchange sites about cache coherency, and I would like to use it in my game, but I'm using child classes of an abstract class, Entity.
I'm storing all entities in a std::vector so that I can access virtual functions in loops. Entity::update() is a virtual function of Entity overridden by subclasses like PlayerEntity.
In Game.hpp - Private Member Variables:
std::vector<Entity*> mEntities;
PlayerEntity* mPlayer;
In Game.cpp - Constructor:
mPlayer = new PlayerEntity();
mEntities.push_back(mPlayer);
Here's what my update function (in the main loop) looks like:
void Game::update() {
for (Entity* entity : mEntities) {
entity->update(mTimeStep, mGeneralClock.getElapsedTime().asMilliseconds());
}
}
My question is:
How do I make my entities objects be next to each other in memory, and thus achieve cache coherency?
I tried to simply make the vector of pointers a vector of objects and make the appropriate changes, but then I couldn't use polymorphism for obvious reasons.
Side question: what determines where an object in allocated in memory?
Am I doing the whole thing wrong? If so, how should I store my entities?
Note: I'm sorry if my english is bad, I'm not a native speaker.
Obviously, first measure which parts are even worth optimizing. Not all games are created equal, and not all code within a game is created equal. There is no use in completely restructuring the script that triggers the end boss's death animation to make it use 1 cache line instead of 2. That said...
If you are aiming for optimizing for cache, forget about inheritance and virtual functions. Or at least be critical of them. As you note, creating a contiguous array of polymorphic objects is somewhere between hard & error-prone and completely infeasible (depending on whether subclasses have different sizes).
You can attempt to create a pool, to have nearby entities (in the entities vector) more likely to be close to each other (in memory), but frankly I doubt you'll do much better than a state of the art general-purpose allocator, especially when the entities' size and lifetime varies significantly. A pool would only help if entities adjacent in the vector are allocated back-to-back. But in that case, any standard allocator gives the same locality advantages. It's not like tcmalloc and friends select a random cache line to allocate from just to annoy you.
You might be able squeeze a bit of memory out of knowing your object types, but this is purely hypothetical and would have to be proven first to justify the effort of implementing it. Also note that a run of the mill pool either assumes that all objects are the same size, or that you never deallocate individual objects. Allowing both puts you halfway towards a general-purpose allocator, which you're bound to do worse.
You can segregate objects based on their types. That is, instead of a single vector with polymorphic Entitys with virtual functions, have N vectors: vector<Bullet>, vector<Monster>, vector<Loot>, and so on. This is less insane than it sounds for threereasons:
Often, you can pull out the entire business of managing one such vector into a dedicated system. So in the end you might even have a vector<System *> where each System has a vector for one kind of thing, and updates all those things in a single virtual call (delegating to many statically-dispatched calls).
You don't need to represent everything ever in this abstraction. Not every little integer needs to be wrapped in its own type of entity.
If you go further down this route and take hints from entity component systems, you also gain an alternative to inheritance for code reuse (class Monster : Entity {}; class Skeleton : Monster {};) that plays nicer with the hard-earned cache friendliness.
It is not easy because polymorphism doesn't work well with cache coherency.
I think the best you can overload the base class new operator to allocate memory from a pool. But to do this, you need to know the size of all derived classes and after some allocating/deallocating you can have memory fragmentation which will lower the gain.
Have a look at Cachegrind, it's a tool that simulates how your program interacts with a machine's cache hierarchy.

Static arrays VS. dynamic arrays in C++11

I know that it's a very old debate that has already been discussed many times all over the world. But I'm currently having troubles deciding which method I should use rather than another between static and dynamic arrays in a particular case. Actually, I woudn't have used C++11, I would have used static arrays. But I'm now confused since there could be equivalent benefits with both.
First solution:
template<size_t N>
class Foo
{
private:
int array[N];
public:
// Some functions
}
Second solution:
template<size_t N>
class Foo
{
private:
int* array;
public:
// Some functions
}
I can't happen to chose since the two have their own advantages:
Static arrays are faster, and we don't care about memory managment at all.
Dynamic arrays do not weigth anything as long as memory is not allocated. After that, they are less handy to use than static arrays. But since C++11, we can have great benefits from move semantics, which we can not use with static arrays.
I don't think there is one good solution, but I would like to get some advice or just to know what you think of all that.
I will actually disagree with the "it depends". Never use option 2. If you want to use a translationtime constant, always use option 1 or std::array. The one advantage you listed, that dynamic arrays weigh nothing until allocated, is actually a horrible, huge disadvantage, and one that needs to be pointed out with great emphasis.
Do not ever have objects that have more than one phase of construction. Never, ever. That should be a rule committed to memory through some large tattoo. Just never do it.
When you have zombies objects that are not quite alive yet, though not quite dead either, the complexity in managing their lifetime grows exponentially. You have to check in every method whether it is fully alive, or only pretending to be alive. Exception safety requires special cases in your destructor. Instead of one simple construction and automatic destruction, you've now added requirements that must be checked at N different places (# methods + dtor). And the compiler doesn't care if you check. And other engineers won't have this requirement broadcast, so they may adjust your code in unsafe ways, using variables without checking. And now all these methods have multiple behaviors depending on the state of the object, so every user of the object needs to know what to expect. Zombies will ruin your (coding) life.
Instead, if you have two different natural lifetimes in your program, use two different objects. But that means you have two different states in your program, so you should have a state machine, with one state having just one object and another state with both, separated by an asynchronous event. If there is no asynchronous event between the two points, if they all fit in one function scope, then the separation is artifical and you should be doing single phase construction.
The only case where a translation time size should translate to a dynamic allocation is when the size is too large for the stack. This then gets to memory optimisation, and it should always be evaluated using memory and profiling tools to see what's best. Option 2 will never be best (it uses a naked pointer - so again we lose RAII and any automatic cleanup and management, adding invariants and making the code more complex and easily breakable by others). Vector (as suggested by bitmask) would be the appropriate first thought, though you may not like the heap allocation costs in time. Other options might be static space in your application's image. But again, these should only be considered once you've determined that you have a memory constraint and what to do from there should be determined by actual measurable needs.
Use neither. You're better off using std::vector in nearly any case. In the other cases, that heavily depends on the reason why std::vector would be insufficient and hence cannot be answered generally!
I'm currently having a problem to decide which one I should use more than another in a particular case.
You'll need to consider your options case-by-case to determine the optimal solution for the given context -- that is, a generalization cannot be made. If one container were ideal for every scenario, the other would be obsolete.
As mentioned already, consider using std implementations before writing your own.
More details:
Fixed Length
Be careful of how much of the stack you consume.
May consume more memory, if you treat it as a dynamically sized container.
Fast copies.
Variable Length
Reallocation and resizing can be costly.
May consume more memory than needed.
Fast moves.
The better choice also requires you understand the complexity of creation, copy, assign, etc. of the element types.
And if you do use std implementations, remember that implementations may vary.
Finally, you can create a container for these types which abstract the implementation details and dynamically select an appropriate data member based on the size and context -- abstracting the detail behind a general interface. This is also useful at times to disable features, or to make some operations (e.g. costly copies) more obvious.
In short, you need to know a lot about the types and usage, and measure several aspects of your program to determine the optimal container type for a specific scenario.

Pointers to objects in a set or in a vector - does it matter?

just came a across a situation where I needs to store heap-allocated pointers (to a class B) in an STL container. The class that owns the privately held container (class A) also creates the instances of B. Class A will be able to return a const pointers to B instances for clients of A.
Now, does it matter if these pointer are stored in a set or a vector? I thought of having a set just to verify that no duplicates are stored but since addresses are stored, two B pointers with the same data can be stored (unless I provide a comparison class for data comparison I presume).
Any thoughts on this (quite vague) subject? What are the pros/cons for the alternatives? Are smart_pointers something to look into?
Please ask me if anything imperative is unclear, thank you!
There's nothing wrong with storing pointers in a standard container - be it a vector, set, map, or whatever. You just have to be aware of who owns that memory and make sure that it's released appropriately. When choosing a container, choose the container that makes the most sense for your needs. vector is great for random access and appending but not so great for inserting elsewhere in the container. list deals with insertions extremely well, but it doesn't have random access. Sets ensure that there are no duplicates in the container and it's sorted (though the sorting isn't very useful if the set holds pointers and you don't give a comparator function) whereas a map is a set of key-value pairs, so sorting and access is done by key. Etc. Etc. Every container has its pros and cons and which is best for a particular situation depends entirely on that situation.
As for pointers, again, having pointers in containers is fine. The issue that you need to worry about is who owns the memory and therefore must worry about freeing it. If there is a clear object that owns what a particular pointer points to, then it should probably be that object which frees it. If it's essentially the container which owns the memory, then you need to make sure that you delete all of the pointers in the container before the container is destroyed.
If you are concerned with there being multiple pointers to the same data floating around or there is no clear owner for a particular chunk of memory, then smart pointers are a good solution. Boost's shared_ptr would probably be a good one to use, and shared_ptr will be part of C++0x. Many would suggest that you should always use shared pointers, but there is some overhead involved and whether it's best for your particular application will depend entirely on your application.
Ultimately, you need to be aware of the strengths and weaknesses of the various container types and determine what the best container is for whatever you're doing. The same goes for how to deal with pointer management. You need to write your program in a way that it's clear who owns a particular chunk of memory and make sure that that owner frees it when appropriate. Shared pointers are just one solution for that (albeit an excellent one). What the best solution is depends on the particulars of your program.
Why would there be any duplicates in the first place? If class A is the sole entity responsible for creating the instances, and it holds the container privately, meaning there's no way for others to mutate it, it seems to me that there should be no cause for duplicates. Well, if there is, won't that be remediable by some checking prior to adding the pointer to the vector?
I don't know why it would matter if you store a pointer in what kind of container. Containers don't really manipulate their data, they only provide access to them in different ways. So, it's up to you :)
If you need to store pointers in stl containers, use shared_ptr.
Now, the set sounds completely wrong. What are you going to do with those?
If you need to add and remove, then list.
If you need to iterate over range, or all, then vector.
If you need to access specific, knowing a key, then map.
Take a look at others as well. One size doesn't fit all.
My answer is that any decisions that you make have be be with your goal in mind. If you need a 'no duplicates allowed' rule that a set enforces, then use a set. If not then you might want to use a vector or any container might do the trick.
As for smart_pointers yes, they are really really useful. Should they be used? I don't know, once again I don't know what your end goal is or the problem that you are trying to solve with them.
Basically it comes down this. If I said "I want to use a hammer. What do you think of that?" you would probably say "Well what for, I know that hammers are pretty good for nails and wood scenarios but they could also be used as a tool to hurt people or maybe as a book stand. Look, just wait a second, what is this for again?" The problem is that I have not, really said why I want to use a hammer. I have not said what goal I am trying to achieve.
So if you have sort of an overall goal then why not let us know, then it will be obvious if you are using the right tools for the job and we can help you more.
In my opinion, stick with vector unless you have a real reason not to. Sets come with some runtime overhead as well as quite large semantic overhead compared to a vector.

Creating a scoped custom memory pool/allocator?

Would it be possible in C++ to create a custom allocator that works simply like this:
{
// Limit memory to 1024 KB
ScopedMemoryPool memoryPool(1024 * 1024);
// From here on all heap allocations ('new', 'malloc', ...) take memory from the pool.
// If the pool is depleted these calls result in an exception being thrown.
// Examples:
std::vector<int> integers(10);
int a * = new int [10];
}
I couldn't find something like this in the boost libraries, or anywhere else.
Is there a fundamental problem that makes this impossible?
You would need to create a custom allocator that you pass in as a template param to vector. This custom allocator would essentially wrap the access to your pool and do whatever size validations that it wants.
Yes you can make such a construct, it's used in many games, but you'll basically need to implement your own containers and call memory allocation methods of that pool that you've created.
You could also experiment with writing a custom allocator for the STL containers, although it seems that that sort of work is generally advised against. (I've done it before and it was tedious, but I don't remember any specific problems.)
Mind- writing your own memory allocator is not for the faint of heart. You could take a look at Doug Lea's malloc, which provides "memory spaces", which you could use in your scoping construct somehow.
I will answer a different question. Look at 'efficient c++' book. One of the things they discuss is implementing this kind of thing. That was for a web server
For this particular thing you can either mess at the c++ layer by overriding new and supplying custom allocators to the STL.
Or you can mess at the malloc level, start with a custom malloc and work from there (like dmalloc)
Is there a fundamental problem that makes this impossible?
Arguing about program behavior would become fundamentally impossible. All sorts of weird issues will come up. Certain sections of the code may or may not execute though this will seeminly have no effect on the next sections which may work un-hindered. Certain sections may always fail. Dealing with the standard-library or any other third party library will become extremely difficult. There may be fragmentations at run-time at times and at times not.
If intent is that all allocations within that scope occur with that allocator object, then it's essentially a thread-local variable.
So, there will be multithreading issues if you use a static or global variable to implement it. Otherwise, not a bad workaround for the statelessness of allocators.
(Of course, you'll need to pass a second template argument eg vector< int, UseScopedPool >.)

Boost shared_ptr use_count function

My application problem is the following -
I have a large structure foo. Because these are large and for memory management reasons, we do not wish to delete them when processing on the data is complete.
We are storing them in std::vector<boost::shared_ptr<foo>>.
My question is related to knowing when all processing is complete. First decision is that we do not want any of the other application code to mark a complete flag in the structure because there are multiple execution paths in the program and we cannot predict which one is the last.
So in our implementation, once processing is complete, we delete all copies of boost::shared_ptr<foo>> except for the one in the vector. This will drop the reference counter in the shared_ptr to 1. Is it practical to use shared_ptr.use_count() to see if it is equal to 1 to know when all other parts of my app are done with the data.
One additional reason I'm asking the question is that the boost documentation on the shared pointer shared_ptr recommends not using "use_count" for production code.
Edit -
What I did not say is that when we need a new foo, we will scan the vector of foo pointers looking for a foo that is not currently in use and use that foo for the next round of processing. This is why I was thinking that having the reference counter of 1 would be a safe way to ensure that this particular foo object is no longer in use.
My immediate reaction (and I'll admit, it's no more than that) is that it sounds like you're trying to get the effect of a pool allocator of some sort. You might be better off overloading operator new and operator delete to get the effect you want a bit more directly. With something like that, you can probably just use a shared_ptr like normal, and the other work you want delayed, will be handled in operator delete for that class.
That leaves a more basic question: what are you really trying to accomplish with this? From a memory management viewpoint, one common wish is to allocate memory for a large number of objects at once, and after the entire block is empty, release the whole block at once. If you're trying to do something on that order, it's almost certainly easier to accomplish by overloading new and delete than by playing games with shared_ptr's use_count.
Edit: based on your comment, overloading new and delete for class sounds like the right thing to do. If anything, integration into your existing code will probably be easier; in fact, you can often do it completely transparently.
The general idea for the allocator is pretty much the same as you've outlined in your edited question: have a structure (bitmaps and linked lists are both common) to keep track of your free objects. When new needs to allocate an object, it can scan the bit vector or look at the head of the linked list of free objects, and return its address.
This is one case that linked lists can work out quite well -- you (usually) don't have to worry about memory usage, because you store your links right in the free object, and you (virtually) never have to walk the list, because when you need to allocate an object, you just grab the first item on the list.
This sort of thing is particularly common with small objects, so you might want to look at the Modern C++ Design chapter on its small object allocator (and an article or two since then by Andrei Alexandrescu about his newer ideas of how to do that sort of thing). There's also the Boost::pool allocator, which is generally at least somewhat similar.
If you want to know whether or not the use count is 1, use the unique() member function.
I would say your application should have some method that eliminates all references to the Foo from other parts of the app, and that method should be used instead of checking use_count(). Besides, if use_count() is greater than 1, what would your program do? You shouldn't be relying on shared_ptr's features to eliminate all references, your application architecture should be able to eliminate references. As a final check before removing it from the vector, you could assert(unique()) to verify it really is being released.
I think you can use shared_ptr's custom deleter functionality to call a particular function when the last copy has been released. That way, you're not using use_count at all.
You would need to hold something other than a copy of the shared_ptr in your vector so that the shared_ptr is only tracking the outstanding processing.
Boost has several examples of custom deleters in the shared_ptr docs.
I would suggest that instead of trying to use the shared_ptr's use_count to keep track, it might be better to implement your own usage counter. this way you will have full control over this rather than using the shared_ptr's one which, as you rightly suggest, is not recommended. You can also pre-set your own counter to allow for the number of threads you know will need to act on the data, rather than relying on them all being initialised at the beginning to get their copies of the structure.