pros, I need some performance-opinions with the following:
1st Question:
I want to store objects in a 3D-Grid-Structure, overall it will be ~33% filled, i.e. 2 out of 3 gridpoints will be empty.
Short image to illustrate:
Maybe Option A)
vector<vector<vector<deque<Obj>> grid;// (SizeX, SizeY, SizeZ);
grid[x][y][z].push_back(someObj);
This way I'd have a lot of empty deques, but accessing one of them would be fast, wouldn't it?
The Other Option B) would be
std::unordered_map<Pos3D, deque<Obj>, Pos3DHash, Pos3DEqual> Pos3DMap;
where I add&delete deques when data is added/deleted. Probably less memory used, but maybe less fast? What do you think?
2nd Question (follow up)
What if I had multiple containers at each position? Say 3 buckets for 3 different entities, say object types ObjA, ObjB, ObjC per grid point, then my data essentially becomes 4D?
Another illustration:
Using Option 1B I could just extend Pos3D to include the bucket number to account for even more sparse data.
Possible queries I want to optimize for:
Give me all Objects out of ObjA-buckets from the entire structure
Give me all Objects out of ObjB-buckets for a set of
grid-positions
Which is the nearest non-empty ObjC-bucket to
position x,y,z?
PS:
I had also thought about a tree based data-structure before, reading about nearest neighbour approaches. Since my data is so regular I had thought I'd save all the tree-building dividing of the cells into smaller pieces and just make a static 3D-grid of the final leafs. Thats how I came to ask about the best way to store this grid here.
Question associated with this, if I have a map<int, Obj> is there a fast way to ask for "all objects with keys between 780 and 790"? Or is the fastest way the building of the above mentioned tree?
EDIT
I ended up going with a 3D boost::multi_array that has fortran-ordering. It's a little bit like the chunks games like minecraft use. Which is a little like using a kd-tree with fixed leaf-size and fixed amount of leaves? Works pretty fast now so I'm happy with this approach.
Answer to 1st question
As #Joachim pointed out, this depends on whether you prefer fast access or small data. Roughly, this corresponds to your options A and B.
A) If you want fast access, go with a multidimensional std::vector or an array if you will. std::vector brings easier maintenance at a minimal overhead, so I'd prefer that. In terms of space it consumes O(N^3) space, where N is the number of grid points along one dimension. In order to get the best performance when iterating over the data, remember to resolve the indices in the reverse order as you defined it: innermost first, outermost last.
B) If you instead wish to keep things as small as possible, use a hash map, and use one which is optimized for space. That would result in space O(N), with N being the number of elements. Here is a benchmark comparing several hash maps. I made good experiences with google::sparse_hash_map, which has the smallest constant overhead I have seen so far. Plus, it is easy to add it to your build system.
If you need a mixture of speed and small data or don't know the size of each dimension in advance, use a hash map as well.
Answer to 2nd question
I'd say you data is 4D if you have a variable number of elements a long the 4th dimension, or a fixed large number of elements. With option 1B) you'd indeed add the bucket index, for 1A) you'd add another vector.
Which is the nearest non-empty ObjC-bucket to position x,y,z?
This operation is commonly called nearest neighbor search. You want a KDTree for that. There is libkdtree++, if you prefer small libraries. Otherwise, FLANN might be an option. It is a part of the Point Cloud Library which accomplishes a lot of tasks on multidimensional data and could be worth a look as well.
I am working on an entity component system for a game engine. One of my goals is to use a data oriented approach for optimal data processing. In other words, I want to follow the guideline of rather wanting structs of arrays than arrays of structs. However, my problem is that I haven't managed to figure out a neat way to solve this for me.
My idea so far is that every component in the system is responsible for a specific part of the game logic, say that the Gravity Component takes care of calculating forces every frame depending on mass, velocity etc. and other components take care of other stuff. Hence every component is interested in different data sets. The Gravity Component might be interested in mass and velocity while the Collision Component might be interested in bounding boxes and position etc.
So far I figured I could have a data manager which saves one array per attribute. So say that entities may have one or more of weight, position, velocity, etc and they would have a unique ID. The data in the data manager would be represented as follows where every number represents an entity ID:
weightarray -> [0,1,2,3]
positionarray -> [0,1,2,3]
velocityarray -> [0,1,2,3]
This approach works good if all entities have each one of the attributes. However if only entity 0 and 2 have all tree attributes and the other ones are entities of the type that does not move, they will not have velocity and the data would look:
weightarray -> [0,1,2,3]
positionarray -> [0,1,2,3]
velocityarray -> [0,2] //either squash it like this
velocityarray -> [0 ,2 ] //or leave "empty gaps" to keep alignment
Suddenly it isn't as easy to iterate throught it. A component only interested in iterating over, and manipulating the velocity would have to either somehow skip the empty gaps if I went by the second approach. The first approach of keeping the array short wouldn't work well either in more complicated situations. Say if I have one entity 0 with all three attributes, another entity 1 having only weight and position, and an entity 2 which only has position and velocity. Finally there is one last entity 3 which only has weight. The arrays squashed would look like:
weightarray -> [0,1,3]
positionarray -> [0,1,2]
velocityarray -> [0,2]
The other approach would leave gaps like so:
weightarray -> [0,1, ,3]
positionarray -> [0,1,2, ]
velocityarray -> [0, ,2, ]
Both of these situations are nontrivial to iterate if you are only interested in iterating over the set of entities that only has a few of the attributes. A given component X would be interested in processing entities with position and velocity for instance. How can I extract iterable array pointers to give to this component to do its calculations? I would want to give it an array where the elements are just next to each other, but that seems impossible.
I've been thinking about solutions like having a bit field for every array, describing which spots are valid and which are gaps, or a system that copies over data to temporary arrays that have no holes and are then given to the components, and other ideas but none that I thought of was elegant and didn't have additional overhead for the processing (such as extra checks if data is valid, or extra copying of data).
I am asking here because I hope that someone of you might have experience with something similar or might have ideas or thoughts helpful in pursuing this issue. :) Also if this whole idea is crap and impossible to get right and you have a much better idea instead, please tell me. Hopefully the question isn't too long or cluttery.
Thanks.
Good question. However, as far as I can tell, there is no straightforward solution to this problem. There are multiple solutions (some of which you've mentioned) but I don't see an immediate silver bullet solution.
Let's look at the goal first. The goal isn't to put all data in linear arrays, that just the means to reach the goal. The goal is optimizing performance by minimizing cache misses. That's all. If you use OOP-Objects, your Entities data will be surrounded by data you don't necessarily need. If your architecture has a cache line size of 64 bytes and you only need weight (float), position (vec3) and velocity (vec3) you use 28 bytes, but the remaining 36 bytes will be loaded anyway. Even worse is when those 3 values are not side by side in memory or your data structure overlaps a cache line boundary, you will load multiple cache lines for just 28 bytes of actually used data.
Now this isn't that bad when you do this a few times. Even if you do it a hundred times, you will hardly notice it. However if you do this thousands of times each second, it may become an issue. So enter DOD, where you optimize for cache utilization, usually by creating linear arrays for each variable, in situations where there are linear access patterns. In your case arrays for weight, position, velocity. When you load the position of one entity, you again load 64 bytes of data. But because your position data is side by side in an array, you don't load 1 position value, you load the data for 5 adjacent entities. The next iteration of your update loop will probably need the next position value, which was already loaded in cache, and so on, until only at the 6th entity it will need to load new data from main memory.
So the goal of DOD isn't using linear arrays, it's maximizing cache utilization by placing data that is accessed at (about) the same time adjacent in memory. If you nearly always access 3 variables at the same time, you don't need to create 3 arrays for each variable, you could just as easily create a struct which contains only those 3 values and create an array of these structs. The best solution always depends on the way you use the data. If your access patterns are linear, but you don't always use all variables, go for separate linear arrays. If your access patterns are more irregular but you always use all variables at the same time, put them in a struct together and create an array of those structs.
So there is your answer in short form: it all depends on your data usage. This is the reason I can't answer your question directly. I can give you some ideas on how to deal with your data, and you can decide for yourself which would be the most useful (if any of them are) in your situation, or maybe you can adapt/mix them up.
You could keep most accessed data in a continuous array. For instance, position is used often by many different components, so it is a prime candidate for a continuous array. Weight on the other hand is only used by the gravity component, so there can be gaps here. You've optimized for the most used case and will get less performance for data that is used less often. Still, I'm not a big fan of this solution for a number of reasons: it's still inefficient, you will load way too much empty data, the lower the ratio of # specific components/ # total entities is the worse it gets. If only one in 8 entities have gravity components, and these entities are spread evenly throughout the arrays, you still get one cache miss for each update. It also assumes all entities will have a position (or whatever is the common variable), it's hard to add and remove entities, it's inflexible and plain ugly (imho anyway). It may be the easiest solution though.
Another way to solve this is using indexes. Every array for a component will be packed, but there are two extra arrays, one to get entity id from a component array index and a second one to get the component array index from an entity id. Let's say position is shared by all entities while weight and velocity are only used by Gravity. You can now iterate over the packed weight and velocity arrays, and to get/set the corresponding position, you can get the gravityIndex -> entityID value, go to the Position component, use it's entityID -> positionIndex to get the correct index in the Position array. The advantage is your weight and velocity accesses will no longer give you cache misses, but you still get cache misses for the positions if the ratio between # gravity components / # position components is low. You also get an extra 2 array lookups, but a 16-bit unsigned int index should be enough in most cases so these arrays will fit nicely into the cache, meaning this might not be a very expensive operation in most cases. Still, profile profile profile to be sure of this!
A third option is data duplication. Now, I'm pretty sure this isn't going to be worth the effort in the case of your Gravity component, I think it's more interesting in computationally heavy situations, but let's take it as an example anyway. In this case, the Gravity component has 3 packed arrays for weight, velocity and position. It also has a similar index table to what you saw in the second option. When you start the Gravity component update, you first update the position array from the original position array in the Position component, using the index table as in example 2. Now you have 3 packed arrays that you can do your calculations with linearly with maximum cache utilization. When you're done, copy the position back to the original Position component using the index table. Now, this won't be faster (in fact probably slower) than the second option if you use it for something like Gravity, because you only read and write position once. However, suppose you have a component where entities interact with each other, with each update pass requiring multiple reads and writes, this may be faster. Still, all depends on access patterns.
The last option I'll mention is a change-based system. You can easily adapt this into something like a messaging system. In this case, you only update data that's changed. In your Gravity component, most objects will be lying on the floor without change, but a few are falling. The Gravity component has packed arrays for position, velocity, weight. If the position is updated during your update loop, you add the entity ID and the new position to a list of changes. When you're done, you send those changes to any other component that's keeping a position value. The same principle if any other component (for instance, the player control component) changes the position, it will send the new positions of changed entities, the Gravity component can listen to that and update only those positions in its positions array. You'll duplicate a lot of data just like in the previous example, but instead of rereading all data every update cycle, you only update data when it changes. Very useful in situations where small amounts of data actually change each frame, but might get ineffective if large amounts of data change.
So there is no silver bullet. There are a lot of options. The best solution is entirely dependent on your situation, on your data and the way you process that data. Maybe none of the examples I gave are right for you, maybe all of them are. Not every component has to work in the same way, some might use the change/message system while others use the indexes option. Remember that while many DOD performance guidelines are great if you need the performance, it is only useful in certain situations. DOD is not about always using arrays, it is not about always maximizing cache utilization, you should only do this where it actually matters. Profile profile profile. Know your data. Know your data access patterns. Know your (cache) architecture. If you do all of that, solutions will become apparent when you reason about it :)
Hope this helps!
The solution is actually accepting that there are limits on how far you can optimize.
Solving the gap problem will only cause the following to be introduced:
If statements (branches) to handle the data exceptions (entities which are missing component).
Introducing holes meaning you may as well iterate lists randomly. The power of DoD is that all data is tightly packed and ordered in the way it will be processed.
What you may want to do:
Create different lists optimized for different systems / cases. Every frame: copy the properties from one system to another system only for the entities that require it (which have that specific component).
Having the following simplified lists and their attributes:
rigidbody (force, velocity, transform)
collision (boundingbox, transform)
drawable (texture_id, shader_id, transform)
rigidbody_to_collision (rigidbody_index, collision_index)
collision_to_rigidbody (collision_index, rigidbody_index)
rigidbody_to_drawable (rigidbody_index, drawable_index)
etc...
For the processes / jobs you may want the following:
RigidbodyApplyForces(...), apply forces (ex. gravity) to velocities
RigidbodyIntegrate(...), apply velocities to transforms.
RigidbodyToCollision(...), copy rigidbody transforms to collision transforms only for entities that have the collision component. The "rigidbody_to_collision" list contains the indices of which rigidbody ID should be copied to which collision ID. This keeps the collision list tightly packed.
RigidbodyToDrawable(...), copy rigidbody transforms to drawable transforms for entities that have the draw component. The "rigidbody_to_drawable" list contains the indices of which rigidbody ID should be copied to which drawable ID. This keeps the drawabkl list tightly packed.
CollisionUpdateBoundingBoxes(...), update bounding boxes using new transforms.
CollisionRecalculateHashgrid(...), update hashgrid using bounding boxes. You may want to execute this divided over several frames to distribute load.
CollisionBroadphaseResolve(...), calculate possible collisions using hashgrid etc....
CollisionMidphaseResolve(...), calculate collision using bounding boxes for broadphase etc....
CollisionNarrowphaseResolve(...), calculate collision using polygons from midphase etc....
CollisionToRigidbody(...), add reactive forces of colliding objects to rigidbody forces. The "collision_to_rigidbody" list contains the indices from which collision ID the force should be added to which rigidbody ID. You may also create another list called "reactive_forces_to_be_added". You can use that to delay the addition of the forces.
RenderDrawable(...), render the drawables to screen (renderer is just simplified).
Of course you'll need a lot more processes / jobs. You probably want to occlude and sort the drawables, add a transform graph system between the physics and drawables (see Sony presentation about how you may do this) etc. The execution of the jobs can be executed distributed over multiple cores. This is very easy when everything is just a list as they can be divided into multiple lists.
When an entity is being created the component data will also be created together and stored in the same order. Meaning the lists will stay mostly in the same order.
In the case of the "copy object to object" processes. If the skipping of holes really is becoming a problem you can always create a "reorder objects" process which will at the end of every frame, distributed over multiple frames, reorder objects into the most optimal order. The order which requires the least skipping of holes. The skipping of holes is the price to pay to keep all lists as tightly packed as possible and also allows it to be ordered in the way it is going to be processed.
I rely on two structures for this problem. Hopefully the diagrams are clear enough (I can add further explanation otherwise):
The sparse array allows us to associate data in parallel to another without hogging up too much memory from unused indices and without degrading spatial locality much at all (since each block stores a bunch of elements contiguously).
You might use a smaller block size than 512 since that can be pretty huge for a particular component type. Something like 32 might be reasonable or you might adjust the block size on the fly based on the sizeof(ComponentType). With this you can just associate your components in parallel to your entities without blowing up memory use too much from unoccupied spaces, though I don't use it that way (I use the vertical type of representation, but my system has many component types -- if you only have a few, you might just store everything in parallel).
However, we need another structure when iterating to figure out which indices are occupied. There I use a hierarchical bitset (I love and use this data structure a lot, but I don't know if there's a formal name for it since it's just something I made without knowing what it's called):
This allows the elements which are occupied to always be accessed in sequential order (similar to that of using sorted indices). This structure is extremely fast for sequential iteration since testing a single bit might indicate that a million contiguous elements can be processed without checking a million bits or having to store and access a million indices into the container.
As a bonus, it also allows you to do set intersections in a best-case scenario of Log(N)/Log(64) (ex: being able to find the set intersection between two dense index sets containing a million elements each in 3-4 iterations) if you ever need fast set intersections which can often be pretty handy for an ECS.
These two structures are the backbones of my ECS engine. They're pretty fast as I can process 2 million particle entities (accessing two different components) without caching the query for the entities with both components at just a little under 30 FPS. Of course that's a crappy frame rate for just 2 million particles, but that's when representing them as entire entities with two components attached each (motion and sprite) with the particle system performing the query every single frame, uncached -- something people would normally never do (better to use like a ParticleEmitter component which represents many particles for a given entity rather than making a particle a whole separate entity itself).
Hopefully the diagrams are clear enough to implement your own version if you're interested.
Rather than addressing the structuring of your data, I'd just like to offer perspective on how I've done stuff like this in the past.
The game engine has a list of managers responsible for various systems in the game (InputManager, PhysicsManager, RenderManager, etc...).
Most things in the 3D world are represented by an Object class, and each Object can have any number of Components. Each component is responsible for different aspects of the object's behavior (RenderComponent, PhysicsComponent, etc...).
The physics component was responsible for loading the physics mesh, and giving it all of the necessary properties like mass, density, center of mass, inertia response data, and more. This component also stored information about the physics model once is was in the world, like a position, rotation, linear velocity, angular velocity, and more.
The PhysicsManager had knowledge of every physics mesh that had been loaded by any physics components, this allowed that manager to handle all physics-related tasks, such as collision detection, dispatching collision messages, doing physics ray casts.
If we wanted specialized behavior that only a few objects would need we would create a component for it, and have that component manipulate data like velocity, or friction, and those changes would be seen by the PhysicsManager and accounted for in the physics simulation.
As far as the data structure goes, you can have the system I mentioned above and structure it in several ways. Generally the Objects are kept in either a Vector or Map, and Components are in a Vector or List on the Object. As far as physics information goes, the PhysicsManager has a list of all physics objects, which can be stored in an Array/Vector, and the PhysicsComponent has a copy of its position, velocity, and other data so that it can do anything that it needs to have that data manipulated by the physics manager. For example if you wanted to alter the velocity of an Object you'd just tell the PhysicsComponent, it would alter its velocity value and then notify the PhysicsManager.
I talk more about the subject of object/component engine structure here: https://gamedev.stackexchange.com/a/23578/12611
I'm studying a little part of a my game engine and wondering how to optimize some parts.
The situation is quite simple and it is the following:
I have a map of Tiles (stored in a bi-dimensional array) (~260k tiles, but assume many more)
I have a list of Items which always are in at least and at most a tile
A Tile can logically contain infinite amount of Items
During game execution many Items are continuously created and they start from their own Tile
Every Item continuously changes its Tile to one of the neighbors (up, right, down, left)
Up to now every Item has a reference to its actual Tile, and I just keep a list of items.
Every time an Item moves to an adjacent tile I just update item->tile = .. and I'm fine. This works fine but it's unidirectional.
While extending the engine I realized that I have to find all items contained in a tile many times and this is effectively degrading the performance (especially for some situations, in which I have to find all items for a range of tiles, one by one).
This means I would like to find a data structure suitable to find all the items of a specific Tile better than in O(n), but I would like to avoid much overhead in the "moving from one tile to another" phase (now it's just assigning a pointer, I would like to avoid doing many operations there, since it's quite frequent).
I'm thinking about a custom data structure to exploit the fact that items always move to neighbor cell but I'm currently groping in the dark! Any advice would be appreciated, even tricky or cryptic approaches. Unfortunately I can't just waste memory so a good trade-off is needed to.
I'm developing it in C++ with STL but without Boost. (Yes, I do know about multimap, it doesn't satisfy me, but I'll try if I don't find anything better)
struct Coordinate { int x, y; };
map<Coordinate, set<Item*>> tile_items;
This maps coordinates on the tile map to sets of Item pointers indicating which items are on that tile. You wouldn't need an entry for every coordinate, only the ones that actually have items on them. Now, I know you said this:
but I would like to avoid much overhead in the "moving from one tile
to another" phase
And this method would involve adding more overhead in that phase. But have you actually tried something like this yet and determined that it is a problem?
To me I would wrap a std::vector into a matrix type (IE impose 2d access on a 1d array) this give you fast random access to any of your tiles (implementing the matrix is trivial).
use
vector_index=y_pos*y_size+x_pos;
to index a vector of size
vector_size=y_size*x_size;
Then each item can have a std::vector of items (if the amount of items a tile has is very dynamic maybe a deque) again these are random access contains with very minimal overhead.
I would stay away from indirect containers for your use case.
PS: if you want you can have my matrix template.
If you really think having each tile store it's items will cost you too much space, consider using a quadtree to store items then. This allows you to efficiently get all the items on a tile, but leaves your Tile grid in place for item movement.
[SOLVED]
So I decided to try and create a sorted doubly linked skip list...
I'm pretty sure I have a good grasp of how it works. When you insert x the program searches the base list for the appropriate place to put x (since it is sorted), (conceptually) flips a coin, and if the "coin" lands on a then that element is added to the list above it(or a new list is created with element in it), linked to the element below it, and the coin is flipped again, etc. If the "coin" lands on b at anytime then the insertion is over. You must also have a -infinite stored in every list as the starting point so that it isn't possible to insert a value that is less than the starting point (meaning that it could never be found.)
To search for x, you start at the "top-left" (highest list lowest value) and "move right" to the next element. If the value is less than x than you continue to the next element, etc. until you have "gone too far" and the value is greater than x. In this case you go back to the last element and move down a level, continuing this chain until you either find x or x is never found.
To delete x you simply search x and delete it every time it comes up in the lists.
For now, I'm simply going to make a skip list that stores numbers. I don't think there is anything in the STL that can assist me, so I will need to create a class List that holds an integer value and has member functions, search, delete, and insert.
The problem I'm having is dealing with links. I'm pretty sure I could create a class to handle the "horizontal" links with a pointer to the previous element and the element in front, but I'm not sure how to deal with the "vertical" links (point to corresponding element in other list?)
If any of my logic is flawed please tell me, but my main questions are:
How to deal with vertical links and whether my link idea is correct
Now that I read my class List idea I'm thinking that a List should hold a vector of integers rather than a single integer. In fact I'm pretty positive, but would just like some validation.
I'm assuming the coin flip would simply call int function where rand()%2 returns a value of 0 or 1 and if it's 0 then a the value "levels up" and if it's 0 then the insert is over. Is this incorrect?
How to store a value similar to -infinite?
Edit: I've started writing some code and am considering how to handle the List constructor....I'm guessing that on its construction, the "-infinite" value should be stored in the vectorname[0] element and I can just call insert on it after its creation to put the x in the appropriate place.
http://msdn.microsoft.com/en-us/library/ms379573(VS.80).aspx#datastructures20_4_topic4
http://igoro.com/archive/skip-lists-are-fascinating/
The above skip lists are implemented in C#, but can work out a c++ implementation using that code.
Just store 2 pointers. One called above, and one called below in your node class.
Not sure what you mean.
According to wikipedia you can also do a geometric distribution. I'm not sure if the type of distribution matters for totally random access, but it obviously matters if you know your access pattern.
I am unsure of what you mean by this. You can represent something like that with floating point numbers.
You're making "vertical" and "horizontal" too complicated. They are all just pointers. The little boxes you draw on paper with lines on them are just to help visualize something when thinking about them. You could call a pointer "elephant" and it would go to the next node if you wanted it to.
eg. a "next" and "prev" pointer are the exact same as a "above"/"below" pointer.
Anyway, good luck with your homework. I got the same homework once in my data structures class.