Backend architecture for online boardgame-like site - django

Hello Stackoverflowers,
We're developing an online board-game (think online monopoly) site using Python for the backend.
We use Django for the non realtime stuff (authenticating, player profiles, ranking...). The chat server is implemented using socket.io and Tornodo. The game server part is what caused us problems.
We currently (that could change) also use Tornado and socket.io, each Tornado instance is located at a gameX.site.com address on a (maybe) different server and host several games simultaneously (much like a chat server in fact, except that messages would not go to all users but only to the ones involved in the same game).
What cause us trouble is how to we update the Django instance (game log, score, and so on) as games progress. Also we'd like to use Django for authentication as each player would ask the Django server to join the game and be given a disposable id/password couple just for it. Obviously we would have to communicate those to the game server in some way.
At first the chosen solution was to use something like Redis as a bidirectional message queue, Django would post is/password to Redis and the Tornado would then querying Redis on incoming connection. Also a Django cron would run every minute or so to deal with the waiting message. But we fear that frequently and possibly long running cron would impede the main site since the PostgreSQL database is hosted on the same server as Django (and some Game server may also run on the same machine).
We could alternatively wait for a player to request a ranking updated to process the past games results but we fear such an indefinite delay will skew the overall ranking (and experience) and would possibly cause data loss.
We could use Celery/RabbitMQ to update the Main database using Django ORM out of the Tornado processes, but would it be possible to use the same solution to communicate the temporary id/password to the game server ? It doesn't look like you can post a message to Celery and retrieve it on an other side.
Thank for your insight.

Related

MMO Backend Architecture Questions and Review Request

I am making a simple MMO backend with Django as a personal/portfolio project.
(I am not actually making an mmo game)
You can find the project here
So I have some questions for the architecture because I designed it
1. Architecture
1.1 Terminology
AuthServer= Web Server responsible for the authentication
UniverseServer= Web Server responsible for the persistent storage of the game (Couldn't find a better naming)
MatchmakingServer= Web Server that acts as intermediary between the client and the GameServerManager
GameServerManager= A cluster manager responsible for starting/stoping GameServerInstances
GameServerInstance= An instance of the game server
PlayerClient= The game client of the player
1.2 Flow
1.2.1 Authentication
PlayerClient logins to AuthServer to obtain an AuthToken at /api/auth/login/
1.2.2 Retrieving Player Data
PlayerClient requests all the needed data from the UniverseServer at /api/universe/user/{...} (example: the characters list)
1.2.3 Connecting To A Game Server
PlayerClient requests an available game server from the MatchmakingServer with some parameters
The MatchmakingServer checks if there is an available GameServerInstance on the database with these parameters
If there is an available GameServerInstance, the MatchmakingServer responds to the PlayerClient with the IP and the Port of the GameServerInstance, else it is gonna send a request to the GameServerManagerto spawn one
When the MatchmakingServer would request the GameServerManager to spawn an instance it would pass a ServerAuthToken so that the GameServerInstance can make authorized requests to /api/universe/server/{...}/ to update any value (example: the experience of a character)
On the Django project I am implementing AuthServer, UniverseServer, MatchmakingServer as separate apps to stay as modular as possible.
First of all I would love from an experienced in this field to review the whole architecture.
And the questions:
Does this architecture makes sense? What could I change to improve (even for naming for a more standard namings/functionality/etc...)
What tools can be used as GameServerManager. Can kubernetes/agones do that? Does AWS or CloudOcean have anything for this?
I am making this project as a portfolio but does this architecture/stack have the capabilities to go to production for even an mmo with a low playerbase like ROTMG?
How could this be scaled? Separate the apps to different projects and host everything individually? Use load balancers? Use proxies?
Is this considered cloud architecture or cloud orchestration or something else?
Every advice is welcome
From the questions you can understand that I have minimal knowledge on this field so please be as detailed as possible (or/and provide resources to read more)
Thanks for your time :)

Handling long requests

I'm working on a long request to a django app (nginx reverse proxy, mysql db, celery-rabbitMQ-redis set) and have some doubts about the solution i should apply :
Functionning : One functionality of the app allows users to migrate thousands of objects from one system to another. Each migration is logged into a db, and the users are provided the possibility to get in a csv format the history of the migration : which objects have been migrated, which status (success, errors, ...)
To get the history, a get request is sent to a django view, which returns, after serialization and rendering into csv, the download response.
Problem : the serialisation and rendering processes, for a large set of objects (e.g. 160 000) are quite long and the request times out.
Some solutions I was thinking about/found thanks to pervious search are :
Increasing the amount of time before timeout : easy, but I saw everywhere that this is a global nginx setting and would affect every requests on the server.
Using an asynchronous task handled by celery : the concept would be to make an initial request to the server, which would launch the serializing and rendering task with celery, and give a special httpresponse to the client. Then the client would regularly ask the server if the job is done, and the server would deliver the history at the end of processing. I like this one but I'm not sure about how to technically implement that.
Creating and temporarily storing the csv file on the server, and give the user a way to access it & to download it. I'm not a big fan of that one.
So my question is : has anyone already faced a similar question ? Do you have advises for the technical implementation of the solution (#2), or a better solution to propose me ?
Thqnks !
Clearly you should use Celery + RabbitMQ/REDIS. If you look at the docs it´s not that hard to setup.
The first question is whether to use RabbitMQ or Redis. There are many SO questions about this with good information about pros/cons.
The implementation in django is really simple. You can just wrap django functions with celery tasks (with #task attribute) and it´ll become async, so this is the easy part.
The problem I see in your project is that the server who is handling http traffic is the same server running the long process. That can affect performance and user experience even if celery is running on the background. Of course that depends on how much traffic you are expecting on that machine and how many migrations can run at the same time.
One of the things you setup on Celery is the number of workers (concurrent processing units) available. So the number of cores in your machine will matter.
If you need to handle http calls quickly I would suggest to delegate the migration process to another machine. Celery/REDIS can be configured that way. Let´s say you´ve got 2 servers. One would handle only normal django calls (no celery) and trigger celery tasks on the other server (the one who actually runs the migration process). Both servers can connect to the same database.
But this is just an infrastructure optimization and you may not need it.
I hope this answers your question. If you have specific Celery issues it would be better to create another question.

Distributed game server using tornado

We are developing a gamified environment which facilitates learning of 'boring' material. Recently, we have launched with the 1.0 release that comprises of the following modules:
Django based web portal for learning courses, viewing leaderboard (scoring), personal activity and achievements.
Tornado based real-time feeds and notification service. We are utilized sockjs-tornado developed by mrjoes. We took benefit from shared session: if user logged in to django site, he automatically opened a channel with our Comet backend.
Tornado based asynchronous multiplayer game server. Again, we have used sockjs-tornado for connecting and serving players (sockets), synchronizing game loop events across the network.
Now, I am working on scalability and high-availability features of the system. The main challenge is to distribute load across game cluster nodes properly and consistently (no data loss, fail-over in case of crash, re-balance if one of game servers (loop) suddenly crashes). The main goal is to make client interaction with game as seamless as possible: client shouldn't make additional actions: manually reconnecting if game crashed due to server fail. The figure depicts a possible architecture of distributed game server.
In case some actions performed on .../game on Django site (CRUD actions), tornado-based Game Router notified in real-time through a kind of oplog collection (capped collection in mongodb which is tailed by Game Router). If this is a new game, Game Router determines next host (game server) for the game by special algorithm (fastest heartbeat trip, or min number of live(active) games). Further, Game Router notifies appropriate Loop by publishing message to topic exchange.
If any game server crashes, Game Router will be immediately notified and re-balances load by taking idle games in memcached cluster.
In case of fail, how client can seamlessly reconnect to new game server ? I don't want to open redundant channel (socket connection).
Whether I am going right way or not ? Please, provide me with good approaches, critics, techniques. It would be nice if you share your minds by drawing diagrams.
Thank you in advance.

Getting "idle in transaction" for postgresql with django

We are using Django 1.3.1 and Postgres 9.1
I have a view which just fires multiple selects to get data from the database.
In Django documents it is mentioned, that when a request is completed then ROLLBACK is issued if only select statements were fired during a call to a view. But, I am seeing lot of "idle in transaction" in the log, especially when I have more than 200 requests. I don't see any commit or rollback statements in the postgres log.
What could be the problem? How should I handle this issue?
First, I would check out the related post What does it mean when a PostgreSQL process is “idle in transaction”? which covers some related ground.
One cause of "Idle in transaction" can be developers or sysadmins who
have entered "BEGIN;" in psql and forgot to "commit" or "rollback".
I've been there. :)
However, you mentioned your problem is related to have a lot of
concurrent connections. It sounds like investigating the "locks" tip
from the post above may be helpful to you.
A couple more suggestions: this problem may be secondary. The primary
problem might be that 200 connections is more than your hardware and
tuning can comfortably handle, so everything gets slow, and when things
get slow, more things are waiting for other things to finish.
If you don't have a reverse proxy like Nginx in front of your web app,
considering adding one. It can run on the same host without additional
hardware. The reverse proxy will serve to regulate the number of
connections to the backend Django web server, and thus the number of
database connections-- I've been here before with having too many
database connections and this is how I solved it!
With Apache's prefork model, there is 1=1 correspondence between the
number of Apache workers and the number of database connections,
assuming something like Apache::DBI is in use. Imagine someone connects
to the web server over a slow connection. The web and database server
take care of the request relatively quickly, but then the request is
held open on the web server unnecessarily long as the content is
dribbled back to the client. Meanwhile, the database connection slot is
tied up.
By adding a reverse proxy, the backend server can quickly delivery a
repliy back to the reverse proxy and then free the backend worker and
database slot.. The reverse proxy is then responsible for getting the
content back to the client, possibly holding open it's own connection
for longer. You may have 200 connections to the reverse proxy up front,
but you'll need far fewer workers and db slots on the backend.
If you graph the db slots with MRTG or similar, you'll see how many
slots you are actually using, and can tune down max_connections in
PostgreSQL, freeing those resources for other things.
You might also look at pg_top to
help monitor what your database is up to.
I understand this is an older question, but this article may describe the problem of idle transactions in django.
Essentially, Django's TransactionMiddleware will not explicitly COMMIT a transaction if it is not marked dirty (usually triggered by writing data). Yet, it still BEGINs a transaction for all queries even if they're read only. So, pg is left waiting to see if any more commands are coming and you get idle transactions.
The linked article shows a small modification to the transaction middleware to always commit (basically remove the condition that checks if the transaction is_dirty). I'll be trying this fix in a production environment shortly.

Django + Coffescript: Real time video application with io sockets

I have been trying to solve this for 2 weeks and I have not been able to reach a solution.
Here is what I am trying to do:
I need a web application in which users can upload a video; the video is going to be transformed using opencv's python API. Since I have Python's API for opencv I decided to create the webapp using Django. Everything is fine to that point.
The problem is that the video transformation is a very long process so I was trying to implement some real time capabilities in order to show the user the video as it is transformed, in other words, I transform a frame and show it to the user inmediatly. I am trying to do this with CoffeScript and io sockets following some examples; however I havent been successful.
My question is; what would be the right approach to add real time capabilities to a Django application ?
I'd recommend using a non-django service to handle the websockets. Setting up websockets properly is tricky on both the client and server side. Look at pusher.com for a free/cheap solution that will just work and save you a whole lot of hassle.
The initial request to start rendering should kick off the long-lived process, and return with an ID which is used to listen to the websocket for updates.
Once you have your websockets set up, you can send messages to the client about each finished frame. Personally I wouldn't try to push the whole frame down the websocket, but rather just send a message saying the frame is done with a URL to get the frame. Then normal HTTP with its caching and browser niceties moves the big data.
You're definitely not choosing the easy path. The easy path is to have your long-lived render task update the render state in the database, and have the client poll that. Extra server load, but a lot simpler.
Django itself really is focused on doing one kind of web interface, which is following the HTTP Request/Response pattern. To maintain a persistent connection with clients, which socket.io really makes dead simple, you need to diverge a bit from a normal Django installation.
This article discusses the issue of doing real-time with Django, with the help of Orbited and Twisted. It's rather old, and it relies on Comet, which is not the preferred way of doing real-time these days.
You might benefit a lot by going for Socket.io on the client, and something like Tornado (wiki) + Tornado client for Socket.io. But, if you really want to stick with Django for the web development (which Tornado also provide), you would need to make the two work together internally, each handling their particular use case.
Finally, this other article discusses how to make Django work with gevent (an coroutine-based networking library for Python) and Socket.io, which might well be your best option otherwise.
Don't hesitate to post questions/comments as they pop up!