Algorithm or data structure for broadcast messages in 3D - c++

Let's say some threads produce data and every piece of data has associated 3D coordinate. And other threads consumes these data and every consumer thread has cubic volume of interest described by center and "radius" (size of the cube). Consumer threads can update their cube of interest parameter (like move it) from time to time. Every piece of data is broadcasted - a copy of it should be received by every thread which has cube of interest which includes this coordinate.
What multi-threaded data structure can be used for this with the best performance? I am using C++, but generic algorithm pointer is fine too.
Bonus: it would be nice if an algorithm will have possibility to generalize to multiple network nodes (some nodes produce data and some consumes with the same rules as threads).
Extra information: there are more consumers than producers, there are much more data broadcasts than cube of interest changes (cube size changes are very rare, but moving is quite common event). It's okay if consumer will start receiving data from the new cube of interest after some delay after changing it (but before that it should continue receive data from the previous cube).

Your terminology is problematic. A cube by definition does not have a radius; a sphere does. A broadcast by definition is received by everyone, it is not received only by those who are interested; a multicast is.
I have encountered this problem in the development of an MMORPG. The approach taken in the development of that MMORPG was a bit wacky, but in the decade that followed my thinking has evolved so I have a much better idea of how to go about it now.
The solution is a bit involved, but it does not require any advanced notions like space partitioning, and it is reusable for all kinds of information that the consumers will inevitably need besides just 3D coordinates. Furthermore, it is reusable for entirely different projects.
We begin by building a light-weight data modelling framework which allows us to describe, instantiate, and manipulate finite, self-contained sets of inter-related observable data known as "Entities" in memory and perform various operations on them in an application-agnostic way.
Description can be done in simple object-relational terms. ("Object-relational" means relational with inheritance.)
Instantiation means that given a schema, the framework creates a container (an "EntitySpace") to hold, during runtime, instances of entities described by the schema.
Manipulation means being able to read and write properties of those entities.
Self-contained means that although an entity may contain a property which is a reference to another entity, the other entity must reside within the same EntitySpace.
Observable means that when the value of a property changes, a notification is issued by the EntitySpace, telling us which property of which entity has changed. Anyone can register for notifications from an EntitySpace, and receives all of them.
Once you have such a framework, you can build lots of useful functionality around it in an entirely application-agnostic way. For example:
Serialization: you can serialize and de-serialize an EntitySpace to and from markup.
Filtering: you can create a special kind of EntitySpace which does not contain storage, and instead acts as a view into a subset of another EntitySpace, filtering entities based on the values of certain properties.
Mirroring: You can keep an EntitySpace in sync with another, by responding to each property-changed notification from one and applying the change to the other, and vice versa.
Remoting: You can interject a transport layer between the two mirrored parts, thus keeping them mirrored while they reside on different threads or on different physical machines.
Every node in the network must have a corresponding "agent" object running inside every node that it needs data from. If you have a centralized architecture, (and I will continue under this hypothesis,) this means that within the server you will have one agent object for each client connected to that server. The agent represents the client, so the fact that the client is remote becomes irrelevant. The agent is only responsible for filtering and sending data to the client that it represents, so multi-threading becomes irrelevant, too.
An agent registers for notifications from the server's EntitySpace and filters them based on whatever criteria you choose. One such criterion for an Entity which contains a 3D-coordinate property can be whether that 3D-coordinate is within the client's area of interest. The center-of-sphere-and-radius approach will work, the center-of-cube-and-size approach will probably work even better. (No need for calculating a square.)

Related

In Vulkan, is it beneficial for the graphics queue family to be separate from the present queue family?

As far as I can tell it is possible for a queue family to support presenting to the screen but not support graphics. Say I have a queue family that supports both graphics and presenting, and another queue family that only supports presenting. Should I use the first queue family for both processes or should I delegate the first to graphics and the latter to presenting? Or would there be no noticeable difference between these two approaches?
No such HW exists, so best approach is no approach. If you want to be really nice, you can handle the separate present queue family case with expending minimal brain-power on it. Though you have no way to test it on real HW that needs it. So I would say abort with a nice error message would be as adequate, until you can get your hands on actual HW that does it.
I think there is bit of a design error here on Khronoses part. Separate present queue does look like a more explicit way. But then, present op itself is not a queue operation, so the driver can use whatever it wants anyway. Also separate present requires extra semaphore, and Queue Family Ownership Transfer (or VK_SHARING_MODE_CONCURRENT resource). The history went the way that no driver is so extremist to report a separate present queue. So I made KhronosGroup/Vulkan-Docs#1234.
For rough notion of what happens at vkQueuePresentKHR, you can inspect Mesa code: https://github.com/mesa3d/mesa/blob/bf3c9d27706dc2362b81aad12eec1f7e48e53ddd/src/vulkan/wsi/wsi_common.c#L1120-L1232. There's probably no monkey business there using the queue you provided except waiting on your semaphore, or at most making a blit of the image. If you (voluntarily) want to use separate present queue, you need to measure and whitelist it only for drivers (and probably other influences) it actually helps (if any such exist, and if it is even worth your time).
First off, I assume you mean "beneficial" in terms of performance, and whenever it comes to questions like that you can never have a definite answer except by profiling the different strategies. If your application needs to run on a variety of hardware, you can have it profile the different strategies the first time it's run and save the results locally for repeated use, provide the user with a benchmarking utility they can run if they see poor performance, etc. etc. Trying to reason about it in the abstract can only get you so far.
That aside, I think the easiest way to think about questions like this is to remember that when it comes to graphics programming, you want to both maximize the amount of work that can be done in parallel and minimize the amount of work overall. If you want to present an image from a non-graphics queue and you need to perform graphics operations on it, you'll need to transfer ownership of it to the non-graphics queue when graphics operations on it have finished. Presumably, that will take a bit of time in the driver if nothing else, so it's only worth doing if it will save you time elsewhere somehow.
A common situation where this would probably save you time is if the device supports async compute and also lets you present from the compute queue. For example, a 3D game might use the compute queue for things like lighting, blur, UI, etc. that make the most sense to do after geometry processing is finished. In this case, the game engine would transfer ownership of the image to be presented to the compute queue first anyway, or even have the compute queue own the swapchain image from beginning to end, so presenting from the compute queue once its work for the frame is done would allow the graphics queue to stay busy with the next frame. AMD and NVIDIA recommend this sort of approach where it's possible.
If your application wouldn't otherwise use the compute queue, though, I'm not sure how much sense it makes or not to present on it when you have the option. The advantage of that approach is that once graphics operations for a given frame are over, you can have the graphics queue immediately release ownership of the image for it and acquire the next one without having to pause to present it, which would allow presentation to be done in parallel with rendering the next frame. On the other hand, you'll have to transfer ownership of it to the compute queue first and set up presentation there, which would add some complexity and overhead. I'm not sure which approach would be faster and I wouldn't be surprised if it varies with the application and environment. Of course, I'm not sure how many realtime Vulkan applications of any significant complexity fit this scenario today, and I'd guess it's not very many as "per-pixel" things tend to be easier and faster to do with a compute shader.

ECS and appropriate usage in games

I've been reading about Entity-Component-Systems and i think i understand the basic concept:
Entities are just IDs with their Components stored in Arrays to reduce cache misses. Systems then iterate over one or more of these Arrays and process the data contained in the Components.
But i don't quite understand how these systems are supposed to efficently and cleanly interact with one and another.
1: If my entity has a health component, how would i go about damaging it?
Just doing health -= damage wouldn't account for dying if health goes below or equal 0. But adding a damage() function to the component would defy the point of components being only data. Basically: How do systems process components which need to respond to their changes and change other components based on their changes? (Without copy-and-pasting the damage code into each system which can possibly inflict damage)
2: Components are supposed to be data-only structs with no functions. How do i best approach entity-specific behaviour like exploding on death. It seems unpractical to fill the Health component with memory-wasting data like explodesOnDeath=false when only one or two out of many entities will actually explode on death. I am not sure how to solve this elegantly.
Is there a common approach to these problems?
Ease of modification (for ex with Lua scripts) and high chances of compatibility are important to me, as i really like games with high modding potential. :)
Used Language: C++
I am also new to the field, but here are my experiences with ECS models:
How do systems process components which need to respond to their changes and change other components based on their changes?
As you correctly pointed out, the components are just containers of data, so don't give them functions. All the logic is handled by the systems and each new piece of logic is handled by a different system. So its a good choice to seperate the logic of "dealing damage" from "killing an entity". The comminication
between the DamageSystem and the DeathSystem (with other words, when should an entity be killed) can the be based on the HealthComponent.
Possible implementation:
You typically have one system (The DamageSystem) that calculates the new health of an entity. For this purpose, it can use all sorts of information (components) about the entity (maybe your entities have some shield to protect them, etc.). If the health falls below 0, the DamageSystem does not care, as its only purpose is to contain the logic of dealing damage.
Besides the DamageSystem, you also want to have some sort of DeathSystem, that checks for each entity if the health is below 0. If this is the case, some action is taken. As every entity does sth on their death (which is the reason why your explodesOnDeath=false is not a bad idea), it is usefull to have a DeathComponent that stores some kind of enum for the death animation (e.g. exploding or just vanishing), a path to a sound file (e.g. a fancy exploding sound) and other stuff you need.
With this approach, all the damage calculation is located at one place and seperated from e.g. the logic of the death of an entity.
Hope this helps!

Is it necessary to include GameObjects whose physics are deterministic in worldUpdate?

In order to reduce data transfer size and the computational time for serializing world objects for each worldUpdate, I was wondering if it is possible to omit syncs for objects whose physics can be entirely, faithfully simulated on the client-side gameEngine (they are not playerObjects so playerInput does not affect them directly, and their physics are entirely deterministic). Interactions with these GameObjects would be entirely handled by GameEvents that are much less frequent. I feel like this should be possible if the client is running the same physics as the server and has access to the same initial conditions.
When I try to omit GameObjects from subsequent worldUpdates, I see that their motion becomes more choppy and they move faster than if they were not omitted; however, when I stop the game server while keeping the client open, their motion is more like what I would expect if I hadn't omitted them. This is all on my local machine with extrapolation synchronization.
The short answer is that the latest version of Lance (1.0.8 at the time of this writing) doesn't support user omission of game objects from world updates, but it does implement a diffing mechanism that omits objects from the update if their netScheme properties haven't changed, saving up on bandwidth.
This means that if you have static objects, like walls, for example, they will only get transmitted once for each player. Not transmitting this at all is an interesting feature to have.
If objects you're referring to are not static, then there is no real way to know their position deterministically. You might have considered using the world step count, but different clients process different world steps at different times due to the web's inherent latency. A client can't know what is the true step being handled by the server at a given point in time, so it cannot deterministically decide on such an object's position. This is why Lance uses the Authoritative server model - to allow one single source of truth, and make sure clients are synched up.
If you still want to manually avoid sending updates for an object, you can edit its netScheme so that it doesn't return anything but its id, for example:
static get netScheme() {
return {
id: { type: Serializer.TYPES.INT32 }
};
}
Though it's not a typical use due to the aforementioned reasons, so if you encounter specific sync issues and this is still a feature you're interested in, it's best if you submit a feature request in the Lance issue tracker. Make sure to include details on your use case to promote a healthy discussion

Proper way of updating a system in an ECS

I am currently trying to implement an (sort-of) Entity-Component-System.
I've got the gist of it, that is, how an ECS is supposed to work. So far i have 4 classes in my design (not yet fully implemented): EntityWorld is a global container for systems, entities and their respective components. It is responsible for updating systems/stepping. EntitySystem represents the base class for a system, with a virtual update-function. Entity is a container, basically with a list of components and an id, nothing more. EntityComponent represents a component. Now, i thought about making it possible to multithread my systems, but i think i've ran into a problem here. Suppose my EntityWorld stores its entities in the simplest way possible, in a std::vector<Entity*> for example. Now, that list would either be fully passed to a system when it is updated, or the EntityWorld loops through this list and sends the entities to the systems one-by-one. In my understanding though, when using multiple threads, this would require me to lock the whole list every time a system is being updated. That would practically equal zero increase in performance, since the other threads are always waiting for the list to get free. Is there a better way to implement this? So multiple threads can be updated (and r/w to entites at the same time)?
Thanks in advance!

C++ Networked Program Design: Boost Asio, Serialization, and OStream

Background Info:
I am beginning to learn about networking for a small demo project that I'm working on. I have a server with a bunch of Ball objects that have a variety of parameters (size, color, velocity, acceleration, etc.). I would like the server to be able to do 2 things
Send all of the parameters to the client so that the client can create a new Ball object that's exactly like how it is on the server.
Be able to periodically send smaller updates about the ball that only change some of its parameters (usually position and velocity). The idea is to not redundantly send information.
I'm a little overwhelmed at how to approach this, since there is so much to deal with. My idea was to create a class called ClientUpdate that would be an abstract base class for specific update types that I might want to send.
class ClientUpdate
{
protected:
UpdateTypes type;
public:
ClientUpdate(){};
void setType(UpdateTypes t){ type = t; }
virtual void print(ostream& where)const;
friend std::ostream& operator<<(std::ostream& os, const ClientUpdate & obj)
{
obj.print(os);
return os;
}
};
Then for every event that might occur on the server, like when the a ball changes color or changes its state from frozen to not-frozen, I would create a subclass of ClientUpdate to describe the event. The subclasses would have simple variables (strings, integers, booleans) that I would write to the ostream with the print function.
Finally, I would store all of the updates that happen in a certain area of my game (such as a room) in each update cycle, and then for any clients who are subscribed to that area, I would send 1 large byte array of client updates that would have the form UPDATETYPE_DATA_UPDATETYPE_DATA_....etc. The client would parse the input stream and re-create the update class from it (I haven't written this code yet, but I assume it won't be difficult).
I'm using Boost::Asio for the networking code, and I'm following the tutorials here : http://www.gamedev.net/blog/950/entry-2249317-a-guide-to-getting-started-with-boostasio/?pg=10. I just mention this because I'm pretty sure I want to stick with boost asio, since I'm trying to very comfortable with boost and modern c++ in general.
Question:
(1) The basic question is "is this a reasonable way of approaching my problem?" I feel very confident that I could at least make it work, but as a novice at anything network-related, I'm not sure if I am re-inventing wheels or wasting time when there are simpler ways of doing things. In particular, is it inefficient to gather all of the "update" objects together and send them with 1 large write or should I send the individual updates with separate writes to the socket?
(2) For example, I've read about Boost::Serialize, and it seems to be very similar to what I'm doing. However, I am more interested in updating certain member variables of objects that should be almost the same on both the client and server. Is Boost::serialize good for this, or is it more for sending whole objects? Are there any other libraries that do things similar to what I'm describing?
The trade offs are hard to judge from here.
I can see a few approaches (disclaimer, I didn't try to be exhaustive, just thinking aloud):
every mutation to game state is an "event"; you "journal" events and every once in a while you send a batch of these to the other side. The other side applies them and sends back a checksum verifying that the resulting state matches that on the sending side (at the time of the sending).
alternatively, you treat the whole game state as a "document". Every once in xxx milliseconds, you snapshot the gamestate, and send it to the other party. The other party replaces its gamestate with that from the document. The server could optimize bandwidth by differencing the gamestate to the former (by saving the previously sent snapshot) and sending only the delta.
In that last respect there might be a similarity to the first approach, but there is a fundamental difference: in the first approach, the mutations sent to the other side are exactly the same as they happened on the source system; In the second approach, the 'delta' mutations are synthesized from the effective difference to the last snapshot: they have no relation to the sequence of events that actually lead to the current game state.
Now, the trade-offs are plentiful and depend on such factors as:
how big is the ful gamestate (a chess board is trivially encoded in few bytes, a 3D shooter cannot afford to send whole snapshots, and may not even be able to afford keeping a snapshot for differencing)
how many balls are there, and how are they stored; if they're in a node-based data structure, replacing the replacing the whole game state may become expensive (since there might be many allocations).
how many distinct state mutations are there (how complex would the command language get; would it make sense to devise a "command language" for the journal, or would it become too complicated?)
how many events will occur per second (is the number of triggers solely input based? E.g. in chess, there will be a move once every n seconds, but in a balancing game there maybe hundreds of inputs each second, if not more).
Etc. All these questions will make certain approaches more attractive and others less.
One crucial question that you didn't address is: will there be "inputs" on both sides? If so, could there be conflicting inputs? Could there be consequences of changes on one side that lead to a different outcome if the inputs from the other side have been received slightly later?
I won't go into this for now. If you need bi-directional synchronization, you will become very dependent on low latency and frequent updates, so that you can correct divergent gamestates before the difference becomes humanly noticeable and annoying.
I also won't go into how you should send the data, as it depends very much on the chosen approach. If you send full documents, as you've noticed, Boost Serialization would look like a good candidate.