Multi-agent system in C++ code design - c++

I have a simulation written in C++ in which I need to maintain a variable number of agents, and I am having trouble deciding how to implement it well. Every agent looks something similar to:
class Agent{
public:
Vector2f pos;
float health;
float data[DATASIZE];
vector<Rule> rules;
}
I need to maintain a variable number of agents in my simulation such that:
Preferably, there is no upper bound on the number of agents
I can easily add an Agent
I can easily remove any agent under some condition (say health<0)
I can easily iterate all agents and do something (say health--)
Preferably, I can parallelize the work using openMP, because many updates are somewhat costly, but completely independent of other agents.
(edit) the order of the agents doesn't matter at all
What kind of container or design principles should I use for the agents? Until now I was using a vector, but I think it pretty hard to erase from this structure: something I need to do quite often, as things die all the time. Are there any alternatives I should look at? I thought of something like List, but I don't think they can be parallelized because they are implemented as linked lists with iterator objects?
Thank you

You could leave the agent in the list when dead, ready for re-use. No worries about shrinking your container, and you retain the benefits of a vector. You could keep a separate stack of pointers to dead/reusable agents, just push onto it when an agent dies, pop one off to reclaim for a new agent.
foreach Agent {
if (agent.health > 0) // skip dead agents
process rules

Until now I was using a vector, but I think it pretty hard to erase from this structure: something I need to do quite often, as things die all the time.
How many do you actually expect to die per each step of your simulation? What seems like "all the time" to a human could still be considered very infrequent to a computer. For instance, if each step of your simulation processes thousands of agents but on average only 1 agent dies every few steps, then agent death is a minor incident. With those kind of numbers, your program spends far more time processing live agents than it does dealing with dead agents and so worrying about the performance of removing a dead agent may not be worth while at all. If making agent removal more efficient would end up making normal agent iteration and processing less efficient (yet agent removal is relatively rare), then that would probably be a poor trade-off.
On the other hand, if large numbers of agents are born and die every simulation step, then you might want to make sure those events can be handled efficiently. So it really depends on the kind of numbers you expect to be dealing with.
My general advice would be to proceed with using std::vector (as long as it fits the rest of your design) unless you really expect a significant number of agent deaths per step compared to the number of agents in total.

List should work pretty well. It can be parallelized, because inserting or removing an element does not invalidate other iterators (except of course iterators pointing to an element being removed).
If you don't need backward traversal, slist is as good as list, and a little faster.
If you don't care about the order of elements, use set.

Use a quadtree like in video games. Then searching on pos is fast and removal is fast too. (Plus you can parallelize across child nodes).

Related

How to LRU-cache numerous objects made of C++ STL heavy structures?

I have big C++/STL data structures (myStructType) with imbricated lists and maps. I have many objects of this type I want to LRU-cache with a key. I can reload objects from disk when needed. Moreover, it has to be shared in a multiprocessing high performance application running on a BSD plateform.
I can see several solutions:
I can consider a life-time sorted list of pair<size_t lifeTime, myStructType v> plus a map to o(1) access the index of the desired object in the list from its key, I can use shm and mmap to store everything, and a lock to manage access (cf here).
I can use a redis server configured for LRU, and redesign my data structures to redis key/value and key/lists pairs.
I can use a redis server configured for LRU, and serialise my data structures (myStructType) to have a simple key/value to manage with redis.
There may be other solutions of course. How would you do that, or better, how have you successfully done that, keeping in mind high performance ?
In addition, I would like to avoid heavy dependencies like Boost.
I actually built caches (not only LRU) recently.
Options 2 and 3 are quite likely not faster than re-reading from disk. That's effectively no cache at all. Also, this would be a far heavier dependency than Boost.
Option 1 can be challenging. For instance, you suggest "a lock". That would be quite a contended lock, as it must protect each and every lifetime update, plus all LRU operations. Since your objects are already heavy, it may be worthwhile to have a unique lock per object. There are intermediate variants of this solution, where there is more than one lock, but also more than one object per lock. (You still need a key to protect the whole map, but that's for replacement only)
You can also consider if you really need strict LRU. That strategy assumes that the chances of an object being reused decreases over time. If that's not actually true, random replacement is just as good. You can also consider evicting more than one element at a time. One of the challenges is that when an element needs removing, it would be so from all threads, but it's sufficient if one thread removes it. That's why a batch removal helps: if a thread tries to take a lock for batch removal and it fails, it can continue under the assumption that the cache will have free space soon.
One quick win is to not update the LRU time of the last used element. It was already the newest, making it any newer won't help. This of course only has an effect if you often use that element quickly again, but (as noted above) otherwise you'd just use random eviction.

Fastest C++ map for worst-case time execution in a real time app?

I'm trying to figure out how I want to store timed events in a real time audio app that may hop around in time a lot, and needs to run with the lowest latency possible. Basically the engine knows what time 'now' is, but 'now' may be non-linear, and there be multiple 'nows' in the future. I'm wondering if:
a) a C++ map of some time keyed by time values is even feasible, when there could be thousands of entries
b) which map or hash table implementation will give me the best performance where best means lowest worst case execution, not lowest average. An implementation that even once in a while takes a really long time will be unusable, something with a more deterministic result would be better.
c) for a bunch of events sharing the same now, should one use some sort of hash multi map or link a list of all events at a given time?
I'm open to any other suggestions of how to do this too, or pointers to resources. Time is encoded in it's own format, representing sections:bars:beats:ticks
thanks!
iain
Nothing can save you from having to profile your code and see for yourself.
Make the data type as easy to change as possible, keep everything modular and parameterised, and then just run some tests.
Start with std::multimap and std::unordered_multimap, with time as the key. Both should have pretty good performance. Try a few different allocators, too.

Organizing a task-based scientific computation

I have a computational algebra task I need to code up. The problem is broken into well-defined individuals tasks that naturally form a tree - the task is combinatorial in nature, so there's a main task which requires a small number of sub-calculations to get its results. Those sub-calculations have sub-sub-calculations and so on. Each calculation only depends on the calculations below it in the tree (assuming the root node is the top). No data sharing needs to happen between branches. At lower levels the number of subtasks may be extremely large.
I had previously coded this up in a functional fashion, calling the functions as needed and storing everything in RAM. This was a terrible approach, but I was more concerned about the theory then.
I'm planning to rewrite the code in C++ for a variety of reasons. I have a few requirements:
Checkpointing: The calculation takes a long time, so I need to be able to stop at any point and resume later.
Separate individual tasks as objects: This helps me keep a good handle of where I am in the computations, and offers a clean way to do checkpointing via serialization.
Multi-threading: The task is clearly embarrassingly parallel, so it'd be neat to exploit that. I'd probably want to use Boost threads for this.
I would like suggestions on how to actually implement such a system. Ways I've thought of doing it:
Implement tasks as a simple stack. When you hit a task that needs subcalculations done, it checks if it has all the subcalculations it requires. If not, it creates the subtasks and throws them onto the stack. If it does, then it calculates its result and pops itself from the stack.
Store the tasks as a tree and do something like a depth-first visitor pattern. This would create all the tasks at the start and then computation would just traverse the tree.
These don't seem quite right because of the problems of the lower levels requiring a vast number of subtasks. I could approach it in a iterator fashion at this level, I guess.
I feel like I'm over-thinking it and there's already a simple, well-established way to do something like this. Is there one?
Technical details in case they matter:
The task tree has 5 levels.
Branching factor of the tree is really small (say, between 2 and 5) for all levels except the lowest which is on the order of a few million.
Each individual task would only need to store a result tens of bytes large. I don't mind using the disk as much as possible, so long as it doesn't kill performance.
For debugging, I'd have to be able to recall/recalculate any individual task.
All the calculations are discrete mathematics: calculations with integers, polynomials, and groups. No floating point at all.
there's a main task which requires a small number of sub-calculations to get its results. Those sub-calculations have sub-sub-calculations and so on. Each calculation only depends on the calculations below it in the tree (assuming the root node is the top). No data sharing needs to happen between branches. At lower levels the number of subtasks may be extremely large... blah blah resuming, multi-threading, etc.
Correct me if I'm wrong, but it seems to me that you are exactly describing a map-reduce algorithm.
Just read what wikipedia says about map-reduce :
"Map" step: The master node takes the input, partitions it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes that smaller problem, and passes the answer back to its master node.
"Reduce" step: The master node then takes the answers to all the sub-problems and combines them in some way to get the output – the answer to the problem it was originally trying to solve.
Using an existing mapreduce framework could save you a huge amount of time.
I just google "map reduce C++" and I start to get results, notably one in boost http://www.craighenderson.co.uk/mapreduce/
These don't seem quite right because of the problems of the lower levels requiring a vast number of subtasks. I could approach it in a iterator fashion at this level, I guess.
You definitely do not want millions of CPU-bound threads. You want at most N CPU-bound threads, where N is the product of the number of CPUs and the number of cores per CPU on your machine. Exceed N by a little bit and you are slowing things down a bit. Exceed N by a lot and you are slowing things down a whole lot. The machine will spend almost all its time swapping threads in and out of context, spending very little time executing the threads themselves. Exceed N by a whole lot and you will most likely crash your machine (or hit some limit on threads). If you want to farm lots and lots (and lots and lots) of parallel tasks out at once, you either need to use multiple machines or use your graphics card.

Setting a memory limit on my AI algorithm?

I am working on a Tetris AI implementation. It is a GUI application that plays the game by itself. The user can manipulate a few parameters that influence the decisions made by the AI. The basic algorithm goes as follows:
Start a new thread and clone the current game state to avoid excessive locking.
Generate a list of all possible future game states. These become child nodes of the current game state.
For each child node generate it's future game states.
Keep doing this recursively until a predefined depth has been reached.
Once the requested depth has been reached, find the best end node and recursively find it's parent node until you have the first level child.
Delete all child nodes that are not on the path between the child node and the end node.
This path is now the list of precalculated moves.
The main game executes the list of precalculated moves (with some fancy animation).
This is working pretty well up until search depth 4. After that I start to get memory problems. The number of possible game states can go from 9 to 34. So the worst case scenario for a level 4 search would be 34^4 game states. Windows XP seems unable to deal with level 5 searches (it hangs at 2+ GB).
So if I want to use deeper searches I'll need to use a strategy where I delete the non-promising branches and continue with the ones that will lead to a good score. But this makes it harder to estimate a maximum acceptable search depth. Therefore I think that I would be better to specify a memory limit instead of a depth limit.
I considering to use a memory pool and use "placement new" to create my objects on the pool's memory segments. However the game grid is implemented as a STL vector. So in order to allocate it on the pool I need to implement a custom allocator.
This seems quite a challenge and perhaps I'm overlooking a simpler solution. So I'd like your insights on how to best deal with this.
Can boost, or another library, provide me some of these facilities? (I already found Poco's MemoryPool.) Are there any good online resources to help me get going?
FYI: here's the source code and a sample binary for Windows.
You can create a memory pool, etc, but that won't really make it any easier or harder to count game state instances. You do need to make sure you don't go over a certain number of active states in your decision tree, with or without a pool. And Boost does have one: http://www.boost.org/doc/libs/1_44_0/libs/pool/doc/index.html
It sounds like you're not really doing any pruning of the tree, which would allow you to get much deeper. Evaluate each future game state and drop ones unlikely to develop into anything useful, and don't waste your time going down that branch.
Despite the lack of context [what kind of search problem are you doing? Depth first, breadth first, A*?....] My suggestion is:
Use semaphores to limit the amount of processing that is done at one time, and then release it once the processing has been evaluated. I can't really recommend a specific library that includes Semaphores, as that threading is not built in to C++, however check with your framework's documentation.

Making a game in C++ using parallel processing

I wanted to "emulate" a popular flash game, Chrontron, in C++ and needed some help getting started. (NOTE: Not for release, just practicing for myself)
Basics:
Player has a time machine. On each iteration of using the time machine, a parallel state
is created, co-existing with a previous state. One of the states must complete all the
objectives of the level before ending the stage. In addition, all the stages must be able
to end the stage normally, without causing a state paradox (wherein they should have
been able to finish the stage normally but, due to the interactions of another state,
were not).
So, that sort of explains how the game works. You should play it a bit to really
understand what my problem is.
I'm thinking a good way to solve this would be to use linked lists to store each state,
which will probably either be a hash map, based on time, or a linked list that iterates
based on time. I'm still unsure.
ACTUAL QUESTION:
Now that I have some rough specs, I need some help deciding on which data structures to use for this, and why. Also, I want to know what Graphics API/Layer I should use to do this: SDL, OpenGL, or DirectX (my current choice is SDL). And how would I go about implementing parallel states? With parallel threads?
EDIT (To clarify more):
OS -- Windows (since this is a hobby project, may do this in Linux later)
Graphics -- 2D
Language -- C++ (must be C++ -- this is practice for a course next semester)
Q-Unanswered: SDL : OpenGL : Direct X
Q-Answered: Avoid Parallel Processing
Q-Answered: Use STL to implement time-step actions.
So far from what people have said, I should:
1. Use STL to store actions.
2. Iterate through actions based on time-step.
3. Forget parallel processing -- period. (But I'd still like some pointers as to how it
could be used and in what cases it should be used, since this is for practice).
Appending to the question, I've mostly used C#, PHP, and Java before so I wouldn't describe myself as a hotshot programmer. What C++ specific knowledge would help make this project easier for me? (ie. Vectors?)
What you should do is first to read and understand the "fixed time-step" game loop (Here's a good explanation: http://www.gaffer.org/game-physics/fix-your-timestep).
Then what you do is to keep a list of list of pairs of frame counter and action. STL example:
std::list<std::list<std::pair<unsigned long, Action> > > state;
Or maybe a vector of lists of pairs. To create the state, for every action (player interaction) you store the frame number and what action is performed, most likely you'd get the best results if action simply was "key <X> pressed" or "key <X> released":
state.back().push_back(std::make_pair(currentFrame, VK_LEFT | KEY_PRESSED));
To play back the previous states, you'd have to reset the frame counter every time the player activates the time machine and then iterate through the state list for each previous state and see if any matches the current frame. If there is, perform the action for that state.
To optimize you could keep a list of iterators to where you are in each previous state-list. Here's some pseudo-code for that:
typedef std::list<std::pair<unsigned long, Action> > StateList;
std::list<StateList::iterator> stateIteratorList;
//
foreach(it in stateIteratorList)
{
if(it->first == currentFrame)
{
performAction(it->second);
++it;
}
}
I hope you get the idea...
Separate threads would simply complicate the matter greatly, this way you get the same result every time, which you cannot guarantee by using separate threads (can't really see how that would be implemented) or a non-fixed time-step game loop.
When it comes to graphics API, I'd go with SDL as it's probably the easiest thing to get you started. You can always use OpenGL from SDL later on if you want to go 3D.
This sounds very similar to Braid. You really don't want parallel processing for this - parallel programming is hard, and for something like this, performance should not be an issue.
Since the game state vector will grow very quickly (probably on the order of several kilobytes per second, depending on the frame rate and how much data you store), you don't want a linked list, which has a lot of overhead in terms of space (and can introduce big performance penalties due to cache misses if it is laid out poorly). For each parallel timeline, you want a vector data structure. You can store each parallel timeline in a linked list. Each timeline knows at what time it began.
To run the game, you iterate through all active timelines and perform one frame's worth of actions from each of them in lockstep. No need for parallel processing.
I have played this game before. I don't necessarily think parallel processing is the way to go. You have shared objects in the game (levers, boxes, elevators, etc) that will need to be shared between processes, possibly with every delta, thereby reducing the effectiveness of the parallelism.
I would personally just keep a list of actions, then for each subsequent iteration start interleaving them together. For example, if the list is in the format of <[iteration.action]> then the 3rd time thru would execute actions 1.1, 2.1, 3.1, 1.2, 2.2, 3.3, etc.
After briefly glossing over the description, I think you have the right idea, I would have a state object that holds the state data, and place this into a linked list...I don't think you need parallel threads...
as far as the graphics API, I have only used opengl, and can say that it is pretty powerful and has a good C / C++ API, opengl would also be more cross platform as you can use the messa library on *Nix computers.
A very interesting game idea. I think you are right that parrellel computing would be benefical to this design, but no more then any other high resource program.
The question is a bit ambigous. I see that you are going to write this in C++ but what OS are you coding it for? Do you intend on it being cross platform and what kind of graphics would you like, ie 3D, 2D, high end, web based.
So basically we need a lot more information.
Parallel processing isn't the answer. You should simply "record" the players actions then play them back for the "previous actions"
So you create a vector (singly linked list) of vectors that holds the actions. Simply store the frame number that the action was taken (or the delta) and complete that action on the "dummy bot" that represents the player during that particular instance. You simply loop through the states and trigger them one after another.
You get a side effect of easily "breaking" the game when a state paradox happens simply because the next action fails.
Unless you're desperate to use C++ for your own education, you should definitely look at XNA for your game & graphics framework (it uses C#). It's completely free, it does a lot of things for you, and soon you'll be able to sell your game on Xbox Live.
To answer your main question, nothing that you can already do in Flash would ever need to use more than one thread. Just store a list of positions in an array and loop through with a different offset for each robot.