Debugging network applications and testing for synchronicity?

Debugging network applications and testing for synchronicity? - c++

If I have a server running on my machine, and several clients running on other networks, what are some concepts of testing for synchronicity between them? How would I know when a client goes out-of-sync?
I'm particularly interested in how network programmers in the field of game design do this (or just any continuous network exchange application), where realtime synchronicity would be a commonly vital aspect of success.
I can see how this may be easily achieved on LAN via side-by-side comparisons on separate machines... but once you branch out the scenario to include clients from foreign networks, I'm just not sure how it can be done without clogging up your messaging system with debug information, and therefore effectively changing the way that synchronicity would result without that debug info being passed over the network.
So what are some ways that people get around this issue?
For example, do they simply induce/simulate latency on the local network before launching to foreign networks, and then hope for the best? I'm hoping there are some more concrete solutions, but this is what I'm doing in the meantime...

When you say synchronized, I believe you are talking about network latency. Meaning, that a client on a local network may get its gaming information sooner than a client on the other side of the country. Correct?
If so, then I'm sure you can look for books or papers that cover this kind of topic, but I can give you at least one way to detect this latency and provide a way to manage it.
To detect latency, your server can use a type of trace route program to determine how long it takes for data to reach each client. A common Linux program example can be found here http://linux.about.com/library/cmd/blcmdl8_traceroute.htm. While the server is handling client data, it can also continuously collect the latency statistics and provide the data to the clients. For example, the server can update each client on its own network latency and what the longest latency is for the group of clients that are playing each other in a game.
The clients can then use the latency differences to determine when they should process the data they receive from the server. For example, a client is told by the server that its network latency is 50 milliseconds and the maximum latency for its group it 300 milliseconds. The client then knows to wait 250 milliseconds before processing game data from the server. That way, each client processes game data from the server at approximately the same time.
There are many other (and probably better) ways to handle this situation, but that should get you started in the right direction.

Related

Real-time duplication of data among EC2 instances located in different regions

I'm new to AWS and back-end architecture in general. My current configuration is an EC2 instance (south-east region Singapore) running a Twisted real-time server for a real-time chat app.
Currently, in my implementation, whenever a sender sends a message to the server, it is stored in a python dictionary on the server if the receiver is not online. So basically it is storing this message in the instance's RAM. Now, I want to make the app available worldwide, so I'll be running it on instances of different regions. So my question is, how am I supposed to duplicate/replicate this dictionary stored in RAM of one instance to all the other instance, so it is readily available in all regions? (The reason of storing the messages in RAM and not in a database is the nature of the app. The app involves a large volume of messages sent in bursts, which requires it to be considerably faster than speeds offered by a persistent DB store's I/O read-writes.) My aim is to make the app available globally, and having real-time performance.
(Kindly don't flag this question as an "opinion-based" question and close it. I'm new to server side architecture and I really need someone to at least just point me in the right direction. And I don't think I'll be able to find help on this anywhere other than StackOverflow.)

Here's a few things I would think of if I had to build it myself (I've implemented most of these pointers in our own project and it took me quite a while).
If you really really need all servers to be in sync you'll need a consensus protocol. If you do. Don't built this yourself. It's going to take a lot of time and errors.
If you can, partition your chat data into chatrooms and have only a few servers handle one chatroom.
I've used msgpack to encode my data. It's faster and smaller than json.
You'll benefit a lot of compressing your data before you send it over the wire. Have a look at something like zlib or lz4
Even though the size of compressed msgpack is almost the same of that compressed json. I'd choose msgpack because it's faster. It's easier to parse because it's length prefixed encoded.
I would try to send messages together. Batch up all messages every x ms. In my project I chose 100ms batching up messages will save you a lot of bandwidth since your compression algorithm can remove more duplication.
You'll have to handle connection timeouts. Only regard a message as sent and done when you get a reply back (you'll have to design/choose your protocol to handle that)
Think of what is acceptable, how much data you're willing to loose when something crashes or otherwise fails. If you're not willing to loose data you'll have to implement something that stores data to disk.
I've had the problem that writes to database we use (Google Cloud Datastore) take a long time as well. Like somewhere between 100ms and 900ms depending on how much I store. What I did was only store this data every x seconds and set flags on objects that need to be saved next run. Of course you can only do this if you're willing to loose some data when your program crashes.
You'll need something to keep track of what servers are running and which server is responsible for which piece of data
Set up something that checks whether your connection is alive. For example send echoRequests and echos every x time. The sooner you detect a faillure the better. Note however if your reactor is blocked by some cpu intensive task it will not send your echo in time.
If you're not in control of how much data comes in you'll have to slow down or penalize connections that would otherwise take up all of your server time.
EDIT: I only now see that you're looking into redis. As far as I know it's a good queueing system. Use that if you can. Implementing the stuff above would take a lot of time to get it right.

Optimizing Jetty for heartbeat detection of thousands of machines?

I have a large number of machines (thousands and more) that every X seconds would perform an HTTP request to a Jetty server to notify they are alive. For what value of X should I use persistent HTTP connections (which limits number of monitored machines to number of concurrent connections), and for what value of X the client should re-establish a TCP connection (which in theory would allow to monitor more machines with the same Jetty server).
How would the answer change for HTTPS connections? (Assuming CPU is not a constraint)
This question ignores scaling-out with multiple Jetty web servers on purpose.
Update: Basically the question can be reduced to the smallest recommended value of lowResourcesMaxIdleTime.

I would say that this is less of a jetty scaling issue and more of a network scaling issue, in which case 'it depends' on your network infrastructure. Only you really know how your network is laid out and what sort of latencies are involved in order to come up with a value of X.
From an overhead perspective the persistent HTTP connections will of course have some minor effect (well I say minor but depends on your network) and the HTTPS will again have a larger impact....but only from a volume of traffic perspective since you are assuming CPU is not a constraint.
So from a jetty perspective, it really doesn't need to be involved in the question, you seem to ultimately be asking for help optimizing bytes of traffic on the wire so really you are looking for the best protocol at this point. Since with HTTP you are having to mess with headers for each request you may be well served looking at something like spdy or websocket which will give you persistent connections but are optimized for low round trip network overhead. But...they seem sort of overkill for a heartbeat. :)

How about just make them request at different time? Assume first machine request, then you pick a time to response to that machine as the next time to heart beat of that machine (also keep the id/time at jetty server), the second machine request, you can pick another time to response to second machine.
In this way, you can make each machine perform heart beat request at different time so no concurrent issue.
You can also use a random time for the first heart beat if all machines might start up at the same time.

Game engine design: Multiplayer and listen servers

My game engine right now consists of a working singleplayer part. I'm now starting to think about how to do the multiplayer part.
I have found out that many games actually don't have a real singleplayer mode, but when playing alone you are actually hosting a local server as well, and almost everything runs as if you were in multiplayer (except that the data packets can be passed over an alternate route for better performance)
My engine would need major refactoring to adapt to this model. There would be three possible modes: Dedicated client, Dedicated server and Client-Server (listen mode)
How often is the listen-server model used in the gaming industry?
What are the (dis)advantages of it?
What other options do I have?

I'll see if I can answer this the best I can:
How often is the listen-server model used in the gaming industry?
When it comes to most online games, you'll find that a large majority of games use a client-server architecture, though not always in the way you think. Take any Source game, for instance. Most will use a standard client-server with a master server architecture (to list games available), in that one person will host a dedicated server and anyone with a client can join it.
However, you have some games and services, take for instance Left 4 Dead, League of Legends, and some XBox Live games, that take a slightly different approach. These all use a client-server architecture with a controlling server. The main idea here is that someone creates a dedicated server that isn't "running" any game. The controlling server will create a "lobby" of sorts, and when the game is started, the controlling server will add them to a queue, and when it is that lobby's turn, it will select a matching dedicated server (in terms of location/speed, availability, numerous factors), and assign the players to that server. Only then will the server actually "run" the game. It's the same idea, but a little simplified, as the client doesn't need to "pick" a server, only join a game and let the controlling server do the work.
Of course, the biggest client-server model is the MMO model, where one or many servers runs a persistent world that handles almost all data and logic. Some of the more famous games using this model are World of Warcraft, Everquest, anything like that.
So where does a listen server fit in here? To be honest, not really that well, however, you will still find many games using it. For instance, most Source games allow listen servers to be created, and many XBox Live games do (it's been a while, but I believe Counter Strike did, as well as Quake 4, and many others). In general though, they seem to be frowned upon due to the advantages of the client-server model, which brings us to our next point.
What are the (dis)advantages of it?
First and foremost: performance. In a client-server model, the client will handle local changes (such as input, graphics, sounds, etc) on each cycle of the game. At the end of the cycle, it will package up relevant data (such as, did the player move? If so, where to? Where is s/he looking now? Velocity? Did they shoot? If so, information on the bullet. Etc) and send that to the server for processing. The server will take this data and determine if every thing is valid such as, is the user moving in a way that indicates hacking (more on that later), is the move valid (anything in the way?), did the bullet from player 1 hit player 2?, and more. Then the server packages this up, and sends it to the clients, which then update whatever necessary, such as adjusting health if the player was shot, kicking the player if it is determined that they are hacking, etc.
A listen server, however, must deal with all of this at the same time. Since I assume you are familiar with programming, you probably realize how much power a game can rob from a computer, especially a poorly designed one. Adding on network processing, security processing, and more as well as the client's game, you can see where performance would take a serious hit, at least as far as just standard processing goes. Furthermore, most servers run on fast networks, and are servers designed to withstand network traffic. If a listen server's network is slow, the entire game will suffer.
Second security, as stated earlier, one of the main things a server will do is determine if a player is exploiting the game. You may have seen these as Punkbuster, VAC, etc. There are a very complicated set of rules that run these programs, for instance, determining the difference between a hacker, and just a very good player. It would be very bad for your game if you weren't able to catch hackers, but even worse if you executed action against a falsely accused one.
A listen server will generally not be able to handle the client's game, the server processing, and the hack detection, and in most cases, detectors like Punkbuster are very hard to, if not impossible to get to run on a listen server, because it's hard for it to function correctly without the necessary processing power, as generally the game logic is prioritized over security, and if the detector is not allowed to process for one frame it may lose the data it needed to convict someone.
Lastly, gameplay. The biggest thing about servers is that they are persistent, meaning that even if everyone leaves, the server will continue to run. This is useful if you have a popular server that doesn't have much activity in the night time, people can still join when they are ready to play and not have to wait for it to be brought back online.
In a listen server, the main disadvantage is that as soon as the client hosting the listen server leaves, the game must either be transferred to another player (creating a lul in the game that can last minutes in some cases), or must end completely. This is not preferable on a big server, as the host must either stay online (wasting a slot in the server, and his/her computer power, which could also slow the game), or end the game for everyone.
However, despite these problems, listen servers do have a few advantages.
Easy to set up: Most listen servers are nothing more than hitting "New game" and letting people join. This is easy for people who just want to play with their friends, and don't wish to have to try to find an empty dedicated server, or play with other people.
Good for testing: If one owns a dedicated server and wishes to change it's configuration, it is generally a better idea to test the configuration first. The user would either have to create a backup of the dedicated server and go blindly into the changes, with the only option being to roll back if something goes wrong, create a new dedicated server to test them, of just create a simple listen server to test them. And in with point 1, these are generally easier to start up and configure. This is especially true, as most dedicated servers are not within the administrators immediate access (most dedicated servers are rented from a remote location). It takes much longer to push configuration changes, as well as commands for restarting, etc, to a remote location than a machine that the administrator is currently on.
Less resources: In most dedicated servers, a user with the same IP cannot connect to the dedicated server (meaning, the client must either host the server, or play, they cannot do both). If the client wishes to play on his/her own server, they will usually need a second machine to host the server, or buy or rent a dedicated server so that they can actually play on it. A listen server requires only one machine, which may be the only thing the client can use.
In either case, both have advantages and disadvantages, and you need to weigh them with what you're willing to design and implement. From my experience, I believe that if you were to implement a listen server it would get used, if for nothing else than for a few users wishing to play around with friends, or test settings.
Lastly:
What other options do I have?
This is an industrial can of worms. In reality, any type of network architecture can be applied to video games. However, from what I've seen, like most internet communication, most boil down to some form of client-server model.
Please let me know if I didn't answer your question, or if you need something expanded, and I'll see what I can do.

How good is NTP for distributed time synchronization?

How accurate is NTP for keeping a set of servers time synchronized?
I'm writing a service which requires a set of servers (some acting as clients, some as servers) synchronized to second level granularity. I'm wondering if NTP is the best thing to use, or if there's something better?
Should I run a ntp server on one of them, and have the others use that as their source? Any other recommendations/horror stories with NTP?
All the servers are linux.
Update: Service levels:
I'd like the one server to be accurate UTC(second level, not microsecond or such), and I'd like all the other servers to be the same ts as that one server, regardless of whether its accurate UTC or not (events are received by this one server from multiple locations at various intervals, I require all those events to be at the same "relative" ts. No, I can't have the main server TS the events as they come in, because that'll require storing an offset (when the event actually happened and when it was logged, which requires a whole lot of extra work), and that complicates matters needlessly.
I've currently set up one server as stratum 2 timeserver, using some startum 1 GPS sources as servers in ntp.conf, on the other servers, I've set this server to be the sole server in ntp.conf.
I hope this will be enough.
Thank you!

NTP will keep you within a second well enough for most applications.
If you need higher precision, and all the servers are running *nix I would investigate implementing Precision Time Protocol. It involves multiple parent clocks and negotiation to find a reliable source in the network. This is the time protocol recommended for timestamping events in the power industry (e.g. accurate timestamping in the log files for relay actions and metering alarms aided in the investigation of the Northeast Blackout of 2003).

First off, you might have a look at the Wikipedia NTP page.
Basically, to start with (I preach this regularly) state what the service levels you want might be. Do you need accurate UTC? To what tolerance? That is, do you really need to know what time it is?
Or do you simply want precise synchronization among the systems?
How many machines are we talking about, and are they geographically distributed?
Some options:
accurate time: Set up at least one server as stratum 2, and have it reference at least 3 stratum 1 servers. If you have lots of servers, make that more than one; obviously you get more reliability by having no single point of failure.
precise synchronization: set up NTP peers.
accurate time and geographical distribution: more than one stratum 2 server, as above, with one "near" each cluster; they can peer at stratum 2 to improve the voting.
I don't think there's anything well known better than NTP that's available.
Update Another question mentions the PTP precision time protocol (IEEE 1588) This is excellent for precise synchronization, but depends on multicast.
Also, it's worth considering getting a GPS time source.

Yes, set up one of your servers as your in-house NTP server, and sync the others to that. It gives you accuracy typically within milliseconds, as I remember.
If any of your servers are way off -- and I can't remember what constitutes 'way off' -- NTP won't fix it. There is a way to automatically fix that but I can't remember at the moment.

How are Massively Multiplayer Online RPGs built?

How are Massively Multiplayer Online RPG games built?
What server infrastructure are they built on? especially with so many clients connected and communicating in real time.
Do they manage with scripts that execute on page requests? or installed services that run in the background and manage communication with connected clients?
Do they use other protocols? because HTTP does not allow servers to push data to clients.
How do the "engines" work, to centrally process hundreds of conflicting gameplay events?
Thanks for your time.

Many roads lead to Rome, and many architectures lead to MMORPG's.
Here are some general thoughts to your bullet points:
The server infrastructure needs to support the ability to scale out... add additional servers as load increases. This is well-suited to Cloud Computing by the way. I'm currently running a large financial services app that needs to scale up and down depending on time of day and time of year. We use Amazon AWS to almost instantly add and remove virtual servers.
MMORPG's that I'm familiar with probably don't use web services for communication (since they are stateless) but rather a custom server-side program (e.g. a service that listens for TCP and/or UDP messages).
They probably use a custom TCP and/or UDP based protocol (look into socket communication)
Most games are segmented into "worlds", limiting the number of players that are in the same virtual universe to the number of game events that one server (probably with lots of CPU's and lots of memory) can reasonably process. The exact event processing mechanism depends on the requirements of the game designer, but generally I expect that incoming events go into a priority queue (prioritized by time received and/or time sent and probably other criteria along the lines of "how bad is it if we ignore this event?").
This is a very large subject overall. I would suggest you check over on Amazon.com for books covering this topic.

What server infrastructure are they built on? especially with so many clients connected and communicating in real time.
I'd guess the servers will be running on Linux, BSD or Solaris almost 99% of the time.
Do they manage with scripts that execute on page requests? or installed services that run in the background and manage communication with connected clients?
The server your client talks to will be a server running a daemons or service that sits idle listening for connections. For instances (dungeons), usually a new process is launched for each group, which would mean there is a dispatcher service somewhere mananging this (analogous to a threadpool)
Do they use other protocols? because HTTP does not allow servers to push data to clients.
UDP is the protocol used. It's fast as it makes no guarantees the packet will be received. You don't care if a bit of latency causes the client to lose their world position.
How do the "engines" work, to centrally process hundreds of conflicting gameplay events?
Most MMOs have zones which limit this to a certain amount of people. For those that do have 100s of people in one area, there is usually high latency. The server is having to deal with 100s of spells being sent its way, which it must calculate damage amounts for each one. For the big five MMOs I imagine there are teams of 10-20 very intelligent, mathematically gifted developers working on this daily and there isn't a MMO out there that has got it right yet, most break after 100 players.
--
Have a look for Wowemu (there's no official site and I don't want to link to a dodgy site). This is based on ApireCore which is an MMO simulator, or basically a reverse engineer of the WoW protocol. This is what the private WoW servers run off. From what I recall Wowemu is
mySQL
Python
However ApireCore is C++.
The backend for Wowemu is amazingly simple (I tried it in 2005 however) and probably a complete over simplification of the database schema. It does gives you a good idea of what's involved.

Because MMOs by and large require the resources of a business to develop and deploy, at which point they are valuable company IP, there isn't a ton of publicly available information about implementations.
One thing that is fairly certain is that since MMOs by and large use a custom client and 3D renderer they don't use HTTP because they aren't web browsers. Online games are going to have their own protocols built on top of TCP/IP or UDP.
The game simulations themselves will be built using the same techniques as any networked 3D game, so you can look towards resources for that problem domain to learn more.
For the big daddy, World of Warcraft, we can guess that their database is Oracle because Blizzard's job listings frequently cite Oracle experience as a requirement/plus. They use Lua for user interface scripting. C++ and OpenGL (for Mac) and Direct3D (for PC) can be assumed as the implementation languages for the game clients because that's what games are made with.
One company that is cool about discussing their implementation is CCP, creators of Eve online. They have published a number of presentations and articles about Eve's infrastructure, and it is a particularly interesting case because they use Stackless Python for a lot of Eve's implementation.
http://www.disinterest.org/resource/PyCon2006-StacklessInEve.wmv
http://us.pycon.org/2009/conference/schedule/event/91/
There was also a recent Game Developer Magazine article on Eve's architecture:
https://store.cmpgame.com/product/3359/Game-Developer-June%7B47%7DJuly-2009-Issue---Digital-Edition

The Software Engineering radio podcast had an episode with Jim Purbrick about Second Life which discusses servers, worlds, scaling and other MMORPG internals.

Traditionally MMOs have been based on C++ server applications running on Linux communicating with a database for back end storage and fat client applications using OpenGL or DirectX.
In many cases the client and server embed a scripting engine which allows behaviours to be defined in a higher level language. EVE is notable in that it is mostly implemented in Python and runs on top of Stackless rather than being mostly C++ with some high level scripts.
Generally the server sits in a loop reading requests from connected clients, processing them to enforce game mechanics and then sending out updates to the clients. UDP can be used to minimize latency and the retransmission of stale data, but as RPGs generally don't employ twitch gameplay TCP/IP is normally a better choice. Comet or BOSH can be used to allow bi-directional communications over HTTP for web based MMOs and web sockets will soon be a good option there.
If I were building a new MMO today I'd probably use XMPP, BOSH and build the client in JavaScript as that would allow it to work without a fat client download and interoperate with XMPP based IM and voice systems (like gchat). Once WebGL is widely supported this would even allow browser based 3D virtual worlds.
Because the environments are too large to simulate in a single process, they are normally split up geographically between processes each of which simulates a small area of the world. Often there is an optimal population for a world, so multiple copies (shards) are run which different sets of people use.
There's a good presentation about the Second Life architecture by Ian Wilkes who was the Director of Operations here: http://www.infoq.com/presentations/Second-Life-Ian-Wilkes
Most of my talks on Second Life technology are linked to from my blog at: http://jimpurbrick.com

Take a look at Erlang. It's a concurrent programming language and runtime system, and was designed to support distributed, fault-tolerant, soft-real-time, non-stop applications.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js