I got everything I wanted to do with django-celery working on my development machine. More specifically, the app accepts photo urls which are then turned into tasks that the same machine downloads.
Now what I want to do is put the django code on heroku and the celery tasks on a dedicated computer that will be kept in the office.
I don't know what the next step is though. How do I tell the django app to connect to the office computer? What is the process for setting up the office computer to accepts tasks from the django app? How do I give the local computer login credentials to the django app so that it can connect to the database to update the models?
Ideally, I am looking to put something like this in my setting.py file:
remote_worker = '123.2.4.23:1234'
and on the office computer
tasks = 'photos/tasks.py'
remote_app = 'herokuapp123.com/myapp'
username = 'me'
password = 'pw'
I know there are a lot of questions. Any help or pointers would be appreciated!
This largely depends on what AMQP backend you are using for celery. If you are using the default (RabbitMQ) you will need do one of the following:
Install RabbitMQ on heroku server, expose its port to your business IP through firewall and configure your office computer to connect to it
Install RabbitMQ locally on your business computer and configure celery on Heroku to connect to it
Install RabbitMQ on both sides and bridge them.
Alternatively you can integrate the heroku server in your own business network using a VPN solution and have them directly talk to each other (because after all you probably don't want to transmit AMQP packets bare over the interwebz).
Scenario 1 is probably the easiest to set up as Heroku already provides you the plugin infrastructure to do so. Scenario 2 is probably not what you want as you will have to punch a hole in your business firewall for that. Both scenarios 1 and 2 will have latency and reliability issues as routing AMQP traffic over the internet is not going to be expedient or reliable. You will have dropped messages and celery will keep retrying until it succeeds or reaches the max number of failures. However AMQP was designed to handle network issues, they just may inadvertently affect your performance if that is critical. But then again in that case you should reconsider putting the celery workers on a business desktop.
Scenario C is probably best in terms of reliability but also more difficult to set up. Choose based on your needs.
Related
First of all please let me be clear that I am a windows user and very new to the web world. For the past months I have been learning both python and django, and it has been a great experience for me. Now I have somehow created a small project that I would like to deploy in the production server. Since django has its built-in development server there was no problem for me. But now that I have to deploy it to a production server I googled around and found Nginx + uWSGI or Nginx + Gunicorn as the best option for it. And as uWSGI and Gunicord are incompatible with Windows, I think I should adapt Ubuntu or other Unix system.
So my questions are:
Just to be clear, as I will have to work with one of the above, please explain to me why do I need two servers?
If I have to adapt the Ubuntu environment, do I have to learn Ubuntu shell scripting, SSH and other stuff? Or the hosting provider will help me do that?
Please let me be aware of what else do I need for the above concerned.
Thank you so much for your time and please pardon if my question was a lame question. Hoping for positive response answers.
A typical configuration involves two server processes (which can be run together on the same actual hardware or virtual server) so that the proxy server in front can buffer slow clients. For instance: a slow client will connect to nginx with a request. Nginx will pass the request on to Gunicorn and Gunicorn will respond. Nginx will then consume the Gunicorn response immediately, freeing up the Gunicorn resources right away. At that point, the slow client can take as much time as it wants to consume the response from Nginx without tying up much in the way of server resources. Alternatives to the two-server-process model are to use async workers with Gunicorn and put Gunicorn itself in front, or to use an async-sync combo like Waitress. Nginx in front has the added benefit of doubling as a ready-to-use statics server, though.
Note that "slow clients" can describe: mobile phones that lose their connection and leave the TCP socket hanging until timeout mid-request; mobile phones that are just slow; unreliable connections of all types; hostile denial-of-service clients who are deliberately trying to use server resources; sometimes any old connection that has a hiccup or malfunction for any reason. So this is a problem that will affect nearly any site.
You won't need shell scripting per se but getting used to Ubuntu will take some time. There is a lot to learn even outside of scripting, like how to use the package manager, how to configure packages once they're installed in ways that won't confound future updates, etc. And you will definitely have to learn to use SSH; it is one of the most fundamental server administration tools in the *nix world.
An alternative to learning to use Ubuntu or another server platform is to use a Platform-as-a-Service option like Heroku, as PaaS hosting providers really will take care of all of that stuff for you. I recommend this approach. That having been said, even though I think PaaS is a good option for people who want to focus on development and not server admin regardless of their level of skill, it's also true that a little bit of experience with Linux server platforms goes a long way in helping you to understand the environment that your code runs in. So even if you go with PaaS, you would still benefit from tinkering with Ubuntu a little (or a lot).
Another benefit from a PaaS is that normally their infrastructure handles the Nginx part of the deal (buffering of slow requests via proxy). This is the case with Heroku, for instance. So you won't have to worry about that part of the infrastructure at all.
This part of the question is too broad to answer, but let me know in the comments if you need clarification.
I'm doing it almoast like in this tutorial: http://michal.karzynski.pl/blog/2013/06/09/django-nginx-gunicorn-virtualenv-supervisor/
Nginx is my proxy to django app running on gunicorn and its serving statics, virtualenv for my python enviroment, supervisor to watch my app's running.
It's possible you will run in some error's if not using Postgresql, ask then I will help (used MySQL in the past now it's Postgresql)
Firstly, there's no need to use Ubuntu if you're happier with Windows. I don't know if nginx works on Windows, but I'd be very surprised if it doesn't (in fact, here are the nginx docs for installing on Windows). Apache, meanwhile, definitely does work on Windows. The Django documentation has a full explanation of how to set up Apache/mod_wsgi to serve Django.
You don't need two servers. I'm not sure why you think you do: the usual reason for that is to have the static assets on a separate server, but you don't mention that as a reason. Since you're only talking about a small site, though, you don't even need to do that. One server configured to serve both Django and the static assets will do fine. Again, the docs explain exactly how to do that.
Hello Stackoverflowers,
We're developing an online board-game (think online monopoly) site using Python for the backend.
We use Django for the non realtime stuff (authenticating, player profiles, ranking...). The chat server is implemented using socket.io and Tornodo. The game server part is what caused us problems.
We currently (that could change) also use Tornado and socket.io, each Tornado instance is located at a gameX.site.com address on a (maybe) different server and host several games simultaneously (much like a chat server in fact, except that messages would not go to all users but only to the ones involved in the same game).
What cause us trouble is how to we update the Django instance (game log, score, and so on) as games progress. Also we'd like to use Django for authentication as each player would ask the Django server to join the game and be given a disposable id/password couple just for it. Obviously we would have to communicate those to the game server in some way.
At first the chosen solution was to use something like Redis as a bidirectional message queue, Django would post is/password to Redis and the Tornado would then querying Redis on incoming connection. Also a Django cron would run every minute or so to deal with the waiting message. But we fear that frequently and possibly long running cron would impede the main site since the PostgreSQL database is hosted on the same server as Django (and some Game server may also run on the same machine).
We could alternatively wait for a player to request a ranking updated to process the past games results but we fear such an indefinite delay will skew the overall ranking (and experience) and would possibly cause data loss.
We could use Celery/RabbitMQ to update the Main database using Django ORM out of the Tornado processes, but would it be possible to use the same solution to communicate the temporary id/password to the game server ? It doesn't look like you can post a message to Celery and retrieve it on an other side.
Thank for your insight.
In my project, I use Django and heroku to deploy it. In Heroku, I use uWSGI server (with asynchronous mode), database is MySQL (on AWS RDS). I used 7 dyno for scaling django app
When I run stress test with 600 request/second, timeout is 30 second.
My server return > 50% with timeouts request.
Any ideas can help me improve my server performance?
If your async setup is right (and this is the hardest part), well your only solution is adding more dynos. If you are not sure about django+async (or if you have not done any particular customization to make them work together), you have probably a screwed up setup (no concurrency at all).
Take in account that uWSGI async mode could means dozens of different setup (gevent, ugreen, callbacks, greenlets...), so some detail on your configuration could help.
After reading a lot of blogposts, I decided to switch from crontab to Celery for my middle-scale Django project. I have a few things I didn't understand:
1- I'm planning to start a micro EC2 instance which will be dedicated to RabbitMQ, would this be sufficient for a small-to-medium heavy tasking? (Such as dispatching periodical e-mails to Amazon SES).
2- Computing of tasks, does compution of tasks occur on the Django server or the rabbitMQ server (assuming the rabbitMQ is on a seperate server)?
3- When I need to grow my system and have 2 or more application servers behind a load balancer, do these two celery machines need to connect to the same rabbitMQ vhost? Assuming application servers are the carbon copy and tasks are same and everything is sync on the database level.
I don't know the answer to this question, but you can definitely configure it to be suitable (e.g. use -c1 for a single process worker to avoid using much memory, or eventlet/gevent pools), see also the --autoscale option. The choice of broker transport also matters here, the ones that are not polling are more CPU effective (rabbitmq/redis/beanstalk).
Computing happens on the workers, the broker is only responsible for accepting, routing and delivering messages (and persisting messages to disk when necessary).
To add additional workers these should indeed connect to the same virtual host. You would
only use separate virtual hosts if you would want applications to have separate message buses.
I currently have a growing Django production server that has all of the front end and backend services running on it. I could keep growing that server larger and larger, but instead I want to try and leave that main server as my backend server and create multiple front end servers that would run apache/nginx and remotely connect to the main production backend server.
I'm using slicehost now, so I don't think I can benefit from having the multiple servers run on an intranet. How do I do this?
The first step in scaling your server is usually to separate the database server. I'm assuming this is all you meant by "backend services", unless you give us any more details.
All this needs is a change to your settings file. Change DATABASE_HOST from localhost to the new IP of your database server.
If your site is heavy on static content, creating a separate media server could help. You may even look into a CDN.
The first step usually is to separate the server running actual Python code and the database server. Any background jobs that does processing would probably run on the database server. I assume that when you say front end server, you actually mean a server running Python code.
Now, as every request will have to do a number of database queries, latency between the webserver and the database server is very important. I don't know if Slicehost has some feature to allow you to create two virtual machines that are "close" in terms of network latency(a quick google search did not find anything). They seem like nice guys, so maybe you could ask them if they have such a service or could make an exception.
Anyway, when you do have two machines on Slicehost, you could check the latency between them by simply pinging between them. When you have the result you will probably know if this is at all feasible or not.
Further steps depends on your application. If it is media heavy, then maybe using a separate media server would make sense. Otherwise the normal step is to add more web servers.
--
As a side note, I personally think it makes more sense to invest in real dedicated servers with dedicated network equipment for this kind of setup. This of course depends on what budget you are on.
I would also suggest looking into Amazon EC2 where you can provision servers that are magically close to each other.