Django "migrate" consuming too much CPU - django

Our staging server, a t2.micro instance on AWS was getting down constantly. On investigating, we found that when manage.py migrate is run CPU usage shoots up to 99%. It was easily reproducible on the local machine. We are running Django 1.9 and postgresql database. I am not sure now, is it us doing something wrong or it is meant to be that way. We have around 18 apps in the project, but running migrate app_name also results in same behaviour. Attaching the screenshots of CPU usage.
Also, I profiled the migrate function, here is a graph:

Are you depending on migrate to run regularly? Because once the project is nearing and then entering production state, there shouldn't be many migrations to run. Or do you mean that migrate takes this long, even if migrate --list shows that there is nothing to migrate?
Also, to know what Postgres is doing, you should set up logging of queries including their time. You can filter to log only longer running queries:
http://www.postgresql.org/docs/9.5/static/runtime-config-logging.html
Run those queries through the explain analyze sql command:
psql> EXPLAIN ANALYZE <complete query>;
http://www.postgresql.org/docs/9.5/static/using-explain.html
You need to provide the information you get from explain to get further help.
EDIT:
Also, you could try to squash migrations if you have a lot of migration files. I could imagine that Django works itself through all of them, one by one. So if you have many apps with many files depending on each other, you can imagine what happens.
https://docs.djangoproject.com/en/1.9/topics/migrations/#squashing-migrations
EDIT 2:
Moving this from the comment into the answer:
Does migrate --list also consume that much CPU? If not, then you could run it first, see whether there really is a need to migrate and only run migrate on those apps that have open migrations.
I think this would be the best. If you can profile in more detail, you might actually address the Django community for help. I could imagine that you have an interesting setup with which to find out how to tune the Django migrations to do less (actually unnecessary) work. But I don't know the migrations code too much so I cannot tell.
But this also depends on how many apps we are talking about, and how many migration files. If you have less than 30 apps (including 3rd party), I think it should work fine and there is something else wrong (IMHO!).
Also, you have not shown the resource usage of your server. If the slowness is due to swapping/too much RAM usage you really might be able to boost it by supplying more RAM (to the process).

I believe migrations consume a lot, specially when having many models and many apps, more apps more dependencies more migrations complexity.
I would recommend starting a new instance which only run migration and shutdown after this. This way you web server could be reachable.

This does not address the problem statement exactly but a part of it. I went through the documentation of AWS t2.micro and found that T2.Micro instances are designed to handle the CPU Burst of short intervals(~1 min) happening after reasonable long intervals. From the t2.micro documentation:
A CPU Credit provides the performance of a full CPU core for one minute. Traditional Amazon EC2 instance types provide fixed performance, while T2 instances provide a baseline level of CPU performance with the ability to burst above that baseline level. The baseline performance and ability to burst are governed by CPU credits.
Running migration's shouldn't be an issue given this ^ even if it is consuming 100% of the CPU. We investigated more and found that there were crons running on the server which were not supposed to be.

Related

Django high memory usage

I'm using django as a backend for a React frontend, and deploying both applications on Heroku.
I also use Gunicorn do serve the application and signed the Hobby plan on Heroku which offers 512 MB of RAM for the application to run.
But the django dyno, is almost always using a lot of memory, and exceeding the 512 mb limit. It goes down to only 40 MB of usage whenever I restart or deploy changes, but as soon as any user uses the system and calls some queries. The memory goes up a lot.
I've read about django and django-rest-framework memory optimization for some days now, and tried some changes like: using --preload on Gunicorn, setting --max-requests to kill process when they're too heavy on memory, I've also set CONN_MAX_AGE for the database and WEB_CONCURRENCY as stated on:
https://devcenter.heroku.com/articles/python-concurrency-and-database-connections
But none of that gave me a good enough result. What I'm guessing is wrong now are my queries, because I've seen some articles about the usage of .iterator() on queries and how it prevents de queries from being cached by the application and I didn't use it in any of my queries.
I don't think caching the queries would help on my application at all, I even store some of the results on React state exactly to keep the queries from being called again.
I tried using .iterator() on some queries but I noticed that when the memory goes up on the container it stays up for a very long time. I saw consumption remain the same for up to 6 hours straight, and I don't think that a query cache would be maintained in memory for so long (or would it?).
So, now I'm a little confused about what to try next and on what I should focus and any help is welcome. Thanks in advance!
EDIT
Just attached an image which shows the memory usage going up 60 MB only because I called the logout function!! Makes no sense to me... Also after it goes up it takes a really, really long time to go back down again.
PRINT_FROM_HEROKU_LOGS
You have to use a memroy profiler to see what function or method allocate memory
An example tool is memray, after installing it, run the django server like this:
python -m memray run ./manage.py runserver
Visit the pages or call the APIs that might use a lot of memory then end the program run (on linux use CTRL+c)
It will generate a file with memory usage details and show you how to convert it to readable format, you can paste here to get some insights if you can't read it by yourself

Postgresql in memory database django

For performance issues I would like to execute an optimization algorithm on an in memory database in django (I'm likely to execute a lot of queries). I know it's possible to use a sqlite in memory (How to run Django's test database only in memory?) but I would rather use postgresql because our prod database is a postgresql one.
Does someone knows how to tell django to create the postgresql database in the memory ?
Thanks in advance
This is premature optimization. Postgresql is very very fast even if you are running it on a cold metal hard disk provided you use the right indexes. If you don't persist the data on disk, you are opening yourself upto a world of pain.
If on the other hand you want to speed up your tests by running an in memory postgresql database you can try something like these non durability optimizations:
http://www.postgresql.org/docs/9.1/static/non-durability.html
The most drastic suggestion is to use a ramdisk on that page. Here's how to set up one. After following the OS/Postgresql steps edit django settings.py and add the tablespace to the DATABASES section.
Last but not least: This is just a complete waste of time.
This is not possible. You cannot use PostgreSQL exclusively in memory; at least not without defeating the purpose of using PostgreSQL over something else. An in-memory data store like Redis running alongside PostgreSQL is the best you can do.
Also, at this point, the configuration is far out of Django's hands. It will have to be done outside of Django's environment.
It appears you may be missing some understanding about how these systems work. Build your app first to verify that everything is working, then worry about optimizing the database in such ways.

Separate server for Memcache/Redis?

I am using Django for my project and I ll be hosting it on Linode or any other hosting service. Plus if I want to use memcache will I require a new Linode for it? Means just one server will be ok or I ll have to host my site on 2 servers, one for memcache and one for django? And is it the same for Redis? Also will I require a separate server for Mysql?
I don't think you understand that nobody is a fortune telling wizard. Nobody knows how many requests you will receive per second, nor how cpu/memory intensive each request will be. Nobody knows how optimized your code is. Nobody knows if your application is read heavy or write heavy. Your use case is your own, and your probably the only one who estimate it.
My only actual advice to you is to try to estimate your server data and sever load and benchmark your setup on one machine. If you are unsatisfied with the performance then scale up. You can either scale up vertically, by increasing the size of your linode, or scale horizontally by adding more linode instances. In the latter case, you will most likely put your DB on a machine of it's own and have multiple django instances fed by a load balancer. These Django instances could each share the same memcache on a machine, or they can each have their own memcaches on their own machine. Which one is better? I can't tell you. It again depends on your use case.
If I were you, I would set it all up on one linode instance. I would create test data that I assume would be close to real world. Then I would try to test my response times with an estimated number of requests per second. I would measure response times, cache hits, and memory usage. I would then decide based on that if my use case is satisfied with this level of performance or not because I'm really the only one who would know what is satisfactory performance. Additionally, adding more linode resources is not necessarily where I would first try and improve performance.
Some great tips on optimizing and benchmarking can be found here:
https://docs.djangoproject.com/en/1.8/topics/performance/
http://blog.disqus.com/post/62187806135/scaling-django-to-8-billion-page-views
http://scottbarnham.com/blog/2008/04/28/django-performance-testing-a-real-world-example/
Late night reading about scaling up Django can be found in many books, I like this one:
https://highperformancedjango.com/
Sorry if I sound a bit blunt, I just want you to understand that nobody can walk in here and give you an answer with a large degree of confidence. This question doesn't have a straight-forward answer.
TL;DR Start with one instance and scale up only if you've convinced yourself you need to.
You say Memcached or Redis, so I assume Redis would be deployed without persistence, with a purely in-memory configuration.
In such case both Memcached and Redis are unlikely to get saturated even if you run them in one server, since the limiting factor is more likely to be a single Django instance if your requests/second go high.
However you should make sure to have enough memory and to configure an appropriate max memory usage for Memcached / Redis (different ways to accomplish this in the two different services). Note that under memory pressure, the Linux OOM killer may kill your cache otherwise, so if you go for a single instance, which seems to me a sensible first step, make sure your Django memory usage plus the memory you allocate for caching, are not enough to go near the limits of the instance free memory.
CPU is hardly going to be an issue as I said since Memcached / Redis are pretty good at using little CPU, so I can't foresee a setup where Django is ok serving pages but the instance is in trouble since the CPU is burned by the cache.

Best practice: Multiple django applications on a single Amazon EC2 instance

I've been runnning a single django application on Amazon EC2 using gunicorn to serve the django portion and nginx for the static files.
I'm going to be starting new project soon, and wondering which of the following options would be better:
A larger amazon EC2 instance (Medium) runnning multiple django applications
Multiple smallers EC2 instances (Small/Micro) all running their own django applications?
Would anybody have any experience with this? What would the relevant performance metrics I could measure to get a good cost to performance ratio?
The answer to this question really depends on your app I'm afraid. You need to benchmark to be sure you are running on the right instance type. Some key metrics to watch are:
CPU
Memory usage
Requests per second, per instance size
App startup time
You will also need to tweak nginx/gunicorn settings to make sure you are running with a configuration that is optimised for your instance size.
If costs are a factor for you, one interesting metric is "cost per ten thousand requests", i.e. how much are you paying per 10000 requests for each instance type?
I agree with Mike Ryan's answer. I would add that you also have to evaluate whether your app needs a separate database. Sometimes it makes sense to isolate large/complex applications with their own database, which makes changes and maintenance easier. (Also reduces your risk in the case that something goes wrong). Not all of your user base would be affected in the case of an outage. You might want to create a separate instance for these applications. Note: Django supports multiple databases in one project but, again, that increases complexity for changes and maintenance.

Django redundancy and replication over two VPS accounts

I'm slowly getting into the position where one of my Django sites needs some robustness behind it. I'd currently running on a single VPS on a SQLite database with memcached.. It's about as un-scaled as things can get.
If I bought another VPS account, what would I want to do?
Move to MySQL/PostgreSQL with replication? What's easiest? Does replication protect me from one server exploding? Are there concurrency downsides?
How do I load-balance between the two servers?
I'd put memcached on the new server too. If I put both IPs into the configuration, would that keep a copy of data on both servers? (I'm thinking of what happens to session data - currently stored in memcached)
I'm currently using Cherokee as the httpd - I'm sure this has its own set of issues. If you've any tips, let me know.
Am I going at this the wrong way? Is there an easier way to have faster, more robust django sites?
First step: switch from SQLite to a real production database (I like Postgres). This should happen long before you even think about a second VPS. SQLite essentially does not support concurrency at all. Personally, I wouldn't even consider deploying a live site on SQLite in the first place.
If your site is running on SQLite and is functioning, my guess is you are still quite a long ways from actually outgrowing your single VPS (unless it's already heavily loaded otherwise).
If/when you do need to add a second server, how you configure things depends on where you're actually seeing a bottleneck. Chances are it'll be the database, in which case a good step might be simply moving the database onto its own server (presuming you can guarantee low latency between the two VPSes) and loading the database server with as much RAM as you can afford. In general disk performance suffers most in a VPS, so another step to consider might be putting the DB onto raw metal.
I'd probably look at those steps before I'd think about DB replication or multiple web-tier servers, but it really depends on profiling your actual case (and how you value performance vs reliability).
Watching the Django Deployment Workshop by Jacob Kaplan-Moss should give you a good overview.
MySQL supports Master-Slave and Master-Master setups I don't use PostgreSQL.
You can use nginx as your loadbalancer, HAProxy is an option, too (SO use it).
Memcached distributes the objects over the servers, If one crashes the data is lost.
I don't know Cherokee, but nginx is great.