Some context:
I have a django server hosted on Heroku, with Waitress to serve requests.
A single dyno has several threads that handle requests simultaneously.
The server among other things handles a torrent of events reported by numerous clients.
The events are written to a redis cache, to be later flushed to the DB.
My Goal:
I would like to optimize this by having the redis cache running on the same dyno that handles requests.
Each dyno will have it's own local cache server (which is shared by all worker threads).
Events will be pushed to the local cache (thus improving the response time). A periodic job (on each dyno) will then collect the events from the cache and flush them to the DB.
My problems:
How do I add a redis cache to my dyno (not as an add-on). I understand it is possible but wasn't able to do it. ref: Is redis on Heroku possible without an addon?
A different cache server that I can add to my dyno would also be a good option.
Thanks,
No point in repeating the detailed answer given at Is redis on Heroku possible without an addon?
It's possible, but there's no reason why a sane developer would go through this just to optimize something that is complete negligible and totally unscalable.
If you must have a local cache, try using local memory caching as your intermediate cache.
Related
I have a working Django application that is running locally using an sqlite3 database without problem. However, when I change the Django database settings to use my external AWS RDS database all my pages start taking upwards of 40 seconds to load. I have checked my AWS metrics and my instance is not even close to being fully utilized. When I make a request to a view with no database read/write operations I also get the same problem. My activity monitor shows my local CPU spiking with each request. It shows a process named 'WindowsServer' using most of the CPU during each request.
I am aware more latency is expected when using a remote database but I don't think this should result in 40 second page lags. What other problems that could be causing this behaviour?
AWS database monitoring
Local machine
So your computer has connection to the server in Amazon, that's the problem with latency. Production servers should be in the same place as DB servers(or should have very very good connection, so the latency is lowered as much as possible.)
--edit--
So we need more details. What is your ISP? What is your connection properties? Uplink, downlink? What are pings to servers in AWS?
I am using Redis for caching and queuing of a flask application. However, I observe that my cache object also has worker job related entries which are breaking my cache data fetching code.
Is there any way I can use redis for serving both purpose together ?
In the Django docs for setting up mod_wsgi, the tutorial notes:
Django doesn’t serve files itself; it leaves that job to whichever Web
server you choose.
We recommend using a separate Web server – i.e., one that’s not also
running Django – for serving media. Here are some good choices:
Nginx
A stripped-down version of Apache
I understand this might be due to wasted resources when Apache spawns new processes to serve each static file, which Nginx avoids. However, Apache's (newish?) Event MPM seems to act similar to an Nginx instance handing off requests to an Apache worker mpm. Therefore I'd like to ask: instead of setting up Nginx to be a reverse proxy for Apache, would using an Apache Event MPM be sufficient for serving static files in Apache?
Apache doesn't spawn a new process for each static file. Apache keeps persistent processes to handle concurrent and subsequent requests just like nginx. The difference is that nginx uses a full async model, whereas Apache relies on processes and/or threading for concurrency, although event MPM uses an async model for initial request acceptance and keep alive connections now. For the majority of people, Apache alone is still a more than acceptable solution. So don't get ahead of yourself if you are just starting out and think you need a Google/Facebook scale solution from the outset.
More important than separate web server is that if using Apache/mod_wsgi, serve the static files under a different host name. That way you avoid heavy weight cookie information being sent for all static file requests. You can do this using virtual hosts in Apache. Also ensure you are using daemon mode of mod_wsgi for running the Django application as that is a better architecture and provides lots more options for setting timeouts so you can have your application recover from various situations which might otherwise cause the server to lock up when overloaded.
For a system which provides a better out of the box configuration and experience than using Apache/mod_wsgi directly and configuring it yourself, look at using mod_wsgi-express.
https://pypi.python.org/pypi/mod_wsgi
http://blog.dscpl.com.au/2015/04/introducing-modwsgi-express.html
http://blog.dscpl.com.au/2015/04/using-modwsgi-express-with-django.html
http://blog.dscpl.com.au/2015/04/integrating-modwsgi-express-as-django.html
The advice about separating the webservers has two advantages. One clearly outlined by Graham. The other is "predictable resource consumption".
The number of resources per HTML page differ. Leaving one webserver to serve the application and the other to serve static resources, has the advantage that you know exactly how many concurrent visitors you can serve: the MaxClients setting of Apache.
If this slows down the loading of images, those webservers need very few modules and no measurable amount of CPU power so a one core machine with SSD disks is all you need and scaling is cheap.
As Graham indicates it starts with a STATIC_URL that has a different hostname. Run it at the same server at the start. When scaling up, tie that hostname to a reverse proxy that serves from several image server backend machines.
Introduction
I encountered this very interesting issue this week, better start with some facts:
pylibmc is not thread safe, when used as django memcached backend, starting multiple django instance directly in shell would crash when hit with concurrent requests.
if deploy with nginx + uWSGI, this problem with pylibmc magically dispear.
if you switch django cache backend to python-memcached, it too will solve this problem, but this question isn't about that.
Elaboration
start with the first fact, this is how I reproduced the pylibmc issue:
The failure of pylibmc
I have a django app which does a lot of memcached reading and writing, and there's this deployment strategy, that I start multiple django process in shell, binding to different ports (8001, 8002), and use nginx to do the balance.
I initiated two separate load test against these two django instance, using locust, and this is what happens:
In the above screenshot they both crashed and reported exactly the same issue, something like this:
Assertion "ptr->query_id == query_id +1" failed for function "memcached_get_by_key" likely for "Programmer error, the query_id was not incremented.", at libmemcached/get.cc:107
uWSGI to the rescue
So in the above case, we learned that multi-thread concurrent request towards memcached via pylibmc could cause issue, this somehow doesn't bother uWSGI with multiple worker process.
To prove that, I start uWSGI with the following settings included:
master = true
processes = 2
This tells uWSGI to start two worker process, I then tells nginx to server any django static files, and route non-static requests to uWSGI, to see what happens. With the server started, I launch the same locust test against django in localhost, and make sure there's enough requests per seconds to cause concurrent request against memcached, here's the result:
In the uWSGI console, there's no sign of dead worker processes, and no worker has been re-spawn, but looking at the upper part of the screenshot, there sure has been concurrent requests (5.6 req/s).
The question
I'm extremely curious about how uWSGI make this go away, and I couldn't learn that on their documentation, to recap, the question is:
How did uWSGI manage worker process, so that multi-thread memcached requests didn't cause django to crash?
In fact I'm not even sure that it's the way uWSGI manages worker processes that avoid this issue, or some other magic that comes with uWSGI that's doing the trick, I've seen something called a memcached router in their documentation that I didn't quite understand, does that relate?
Isn't it because you actually have two separate processes managed by uWSGI? As you are setting the processes option instead of the workers option, so you should actually have multiple uWSGI processes (I'm assuming a master + two workers because of the config you used). Each of those processes will have it's own loaded pylibmc, so there is not state sharing between threads (you haven't configured threads on uWSGI after all).
After reading a lot of blogposts, I decided to switch from crontab to Celery for my middle-scale Django project. I have a few things I didn't understand:
1- I'm planning to start a micro EC2 instance which will be dedicated to RabbitMQ, would this be sufficient for a small-to-medium heavy tasking? (Such as dispatching periodical e-mails to Amazon SES).
2- Computing of tasks, does compution of tasks occur on the Django server or the rabbitMQ server (assuming the rabbitMQ is on a seperate server)?
3- When I need to grow my system and have 2 or more application servers behind a load balancer, do these two celery machines need to connect to the same rabbitMQ vhost? Assuming application servers are the carbon copy and tasks are same and everything is sync on the database level.
I don't know the answer to this question, but you can definitely configure it to be suitable (e.g. use -c1 for a single process worker to avoid using much memory, or eventlet/gevent pools), see also the --autoscale option. The choice of broker transport also matters here, the ones that are not polling are more CPU effective (rabbitmq/redis/beanstalk).
Computing happens on the workers, the broker is only responsible for accepting, routing and delivering messages (and persisting messages to disk when necessary).
To add additional workers these should indeed connect to the same virtual host. You would
only use separate virtual hosts if you would want applications to have separate message buses.