Memory monitoring on Heroku - django

We're running Django on Heroku, and I'm looking for a way to monitor how much memory is being used. Once we go over our limit, we get errors that do tell us how much memory we are using, but I'd like to see how memory is ebbing and flowing even when we are under our limit.
It seems like basic functionality, but I haven't seen anything in the Heroku dyno docs that suggests a way to do it. I'd really appreciate any pointers.
Thanks a lot!
Clay

I use Django on Heroku as well. The best way to monitor memory is to install the NewRelic addon. When you first install it (the free version), you get a week of their 'advanced' usage tier free, which allows you to see all the stats about your application:
Total requests per minute.
Average response time.
Slow pages.
Database queries (response times, etc.).
End user page load times.
A variety of analytics.
Background job stats (memory, cpu, etc.).
Available RAM, RAM usage over time, etc.
and lots more
For reference, here's a screenshot so you can see what I'm talking about:

Related

Django high memory usage

I'm using django as a backend for a React frontend, and deploying both applications on Heroku.
I also use Gunicorn do serve the application and signed the Hobby plan on Heroku which offers 512 MB of RAM for the application to run.
But the django dyno, is almost always using a lot of memory, and exceeding the 512 mb limit. It goes down to only 40 MB of usage whenever I restart or deploy changes, but as soon as any user uses the system and calls some queries. The memory goes up a lot.
I've read about django and django-rest-framework memory optimization for some days now, and tried some changes like: using --preload on Gunicorn, setting --max-requests to kill process when they're too heavy on memory, I've also set CONN_MAX_AGE for the database and WEB_CONCURRENCY as stated on:
https://devcenter.heroku.com/articles/python-concurrency-and-database-connections
But none of that gave me a good enough result. What I'm guessing is wrong now are my queries, because I've seen some articles about the usage of .iterator() on queries and how it prevents de queries from being cached by the application and I didn't use it in any of my queries.
I don't think caching the queries would help on my application at all, I even store some of the results on React state exactly to keep the queries from being called again.
I tried using .iterator() on some queries but I noticed that when the memory goes up on the container it stays up for a very long time. I saw consumption remain the same for up to 6 hours straight, and I don't think that a query cache would be maintained in memory for so long (or would it?).
So, now I'm a little confused about what to try next and on what I should focus and any help is welcome. Thanks in advance!
EDIT
Just attached an image which shows the memory usage going up 60 MB only because I called the logout function!! Makes no sense to me... Also after it goes up it takes a really, really long time to go back down again.
PRINT_FROM_HEROKU_LOGS
You have to use a memroy profiler to see what function or method allocate memory
An example tool is memray, after installing it, run the django server like this:
python -m memray run ./manage.py runserver
Visit the pages or call the APIs that might use a lot of memory then end the program run (on linux use CTRL+c)
It will generate a file with memory usage details and show you how to convert it to readable format, you can paste here to get some insights if you can't read it by yourself

Google Cloud - Stack recommendation for Tomcat/PostgreSQL/HTTPS/SFTP site?

This is my first attempt at looking into cloud hosting and I'm feeling like a complete idiot. I have always had my own dedicated server with which I would would remote in and install/manage everything myself. So this cloud thing is completely new for me. I just can't seem to grasp basic things... like how I would get Tomcat and PostgreSQL installed in a way that they could talk to each other or get my domain and SSL cert on there, etc.
If I could just get a feel for where I should start, then I could probably calculate my costs and jump into the free trial where hopefully things will click for me.
Here are my basic, high-level requirements...
My web app running in Tomcat over HTTPS
Let's say approximately 1,000 page views per day
PostgreSQL supporting my web app.
Let's say approximately 10GB database storage
Throughout the day, a fairly steady stream of inbound SFTP data (~ 100MB per day)
The processing load on the app server side should be fairly light. The heaving lifting will be on the DB side sorting through and processing lots of data.
I'm having trouble figuring out which options I would install and calculating costs. If someone could help me get started by saying something like "You would start with a std-xyz-med server, install ABC located here at http://blahblah, then install XYZ located at http://XYZ.... etc.. etc. You can expect to pay somewhere around $100-$200 per month"....
Thoughts?
I would be eternally grateful. It seems like they should have some free sales support channel to ask someone at Google about this, but I don't see it.
Thank You!
I'll try to give you some tips where to start looking.
I will be referring to some products, here are the links
If you want to stick to your old ways, you can always spin up an instance on Compute Engine and set it up the same way you did before, these are just regular virtual machines. For some use cases this is completely valid.
You can split different components of your stack to different products:
For example, if your app is fine with postgresql, you can spin up a fully managed service in Cloud SQL, which might make it easier to manage backup or have several apps access the same db.
Alternatively, have a look at the different DB offerings to see if any of them matches your needed workload better. Perhaps have a look at BigQuery?
If you want to turn your app into a microservice, which is then easier to autoscale and is more fault tolerant, have a look at App Engine. That way you don't need to manage a virtual machine. The docs here will lead you through some easy to follow examples on how to set up SSL.
For the services to talk to each other, refer to docs of the individual components. It's usually very simple.
With pricing, try https://cloud.google.com/products/calculator/
Things like BigQuery have different pricing models - you don't pay for server uptime, but for amounts of data stored & processed with your queries.

Django "migrate" consuming too much CPU

Our staging server, a t2.micro instance on AWS was getting down constantly. On investigating, we found that when manage.py migrate is run CPU usage shoots up to 99%. It was easily reproducible on the local machine. We are running Django 1.9 and postgresql database. I am not sure now, is it us doing something wrong or it is meant to be that way. We have around 18 apps in the project, but running migrate app_name also results in same behaviour. Attaching the screenshots of CPU usage.
Also, I profiled the migrate function, here is a graph:
Are you depending on migrate to run regularly? Because once the project is nearing and then entering production state, there shouldn't be many migrations to run. Or do you mean that migrate takes this long, even if migrate --list shows that there is nothing to migrate?
Also, to know what Postgres is doing, you should set up logging of queries including their time. You can filter to log only longer running queries:
http://www.postgresql.org/docs/9.5/static/runtime-config-logging.html
Run those queries through the explain analyze sql command:
psql> EXPLAIN ANALYZE <complete query>;
http://www.postgresql.org/docs/9.5/static/using-explain.html
You need to provide the information you get from explain to get further help.
EDIT:
Also, you could try to squash migrations if you have a lot of migration files. I could imagine that Django works itself through all of them, one by one. So if you have many apps with many files depending on each other, you can imagine what happens.
https://docs.djangoproject.com/en/1.9/topics/migrations/#squashing-migrations
EDIT 2:
Moving this from the comment into the answer:
Does migrate --list also consume that much CPU? If not, then you could run it first, see whether there really is a need to migrate and only run migrate on those apps that have open migrations.
I think this would be the best. If you can profile in more detail, you might actually address the Django community for help. I could imagine that you have an interesting setup with which to find out how to tune the Django migrations to do less (actually unnecessary) work. But I don't know the migrations code too much so I cannot tell.
But this also depends on how many apps we are talking about, and how many migration files. If you have less than 30 apps (including 3rd party), I think it should work fine and there is something else wrong (IMHO!).
Also, you have not shown the resource usage of your server. If the slowness is due to swapping/too much RAM usage you really might be able to boost it by supplying more RAM (to the process).
I believe migrations consume a lot, specially when having many models and many apps, more apps more dependencies more migrations complexity.
I would recommend starting a new instance which only run migration and shutdown after this. This way you web server could be reachable.
This does not address the problem statement exactly but a part of it. I went through the documentation of AWS t2.micro and found that T2.Micro instances are designed to handle the CPU Burst of short intervals(~1 min) happening after reasonable long intervals. From the t2.micro documentation:
A CPU Credit provides the performance of a full CPU core for one minute. Traditional Amazon EC2 instance types provide fixed performance, while T2 instances provide a baseline level of CPU performance with the ability to burst above that baseline level. The baseline performance and ability to burst are governed by CPU credits.
Running migration's shouldn't be an issue given this ^ even if it is consuming 100% of the CPU. We investigated more and found that there were crons running on the server which were not supposed to be.

Separate server for Memcache/Redis?

I am using Django for my project and I ll be hosting it on Linode or any other hosting service. Plus if I want to use memcache will I require a new Linode for it? Means just one server will be ok or I ll have to host my site on 2 servers, one for memcache and one for django? And is it the same for Redis? Also will I require a separate server for Mysql?
I don't think you understand that nobody is a fortune telling wizard. Nobody knows how many requests you will receive per second, nor how cpu/memory intensive each request will be. Nobody knows how optimized your code is. Nobody knows if your application is read heavy or write heavy. Your use case is your own, and your probably the only one who estimate it.
My only actual advice to you is to try to estimate your server data and sever load and benchmark your setup on one machine. If you are unsatisfied with the performance then scale up. You can either scale up vertically, by increasing the size of your linode, or scale horizontally by adding more linode instances. In the latter case, you will most likely put your DB on a machine of it's own and have multiple django instances fed by a load balancer. These Django instances could each share the same memcache on a machine, or they can each have their own memcaches on their own machine. Which one is better? I can't tell you. It again depends on your use case.
If I were you, I would set it all up on one linode instance. I would create test data that I assume would be close to real world. Then I would try to test my response times with an estimated number of requests per second. I would measure response times, cache hits, and memory usage. I would then decide based on that if my use case is satisfied with this level of performance or not because I'm really the only one who would know what is satisfactory performance. Additionally, adding more linode resources is not necessarily where I would first try and improve performance.
Some great tips on optimizing and benchmarking can be found here:
https://docs.djangoproject.com/en/1.8/topics/performance/
http://blog.disqus.com/post/62187806135/scaling-django-to-8-billion-page-views
http://scottbarnham.com/blog/2008/04/28/django-performance-testing-a-real-world-example/
Late night reading about scaling up Django can be found in many books, I like this one:
https://highperformancedjango.com/
Sorry if I sound a bit blunt, I just want you to understand that nobody can walk in here and give you an answer with a large degree of confidence. This question doesn't have a straight-forward answer.
TL;DR Start with one instance and scale up only if you've convinced yourself you need to.
You say Memcached or Redis, so I assume Redis would be deployed without persistence, with a purely in-memory configuration.
In such case both Memcached and Redis are unlikely to get saturated even if you run them in one server, since the limiting factor is more likely to be a single Django instance if your requests/second go high.
However you should make sure to have enough memory and to configure an appropriate max memory usage for Memcached / Redis (different ways to accomplish this in the two different services). Note that under memory pressure, the Linux OOM killer may kill your cache otherwise, so if you go for a single instance, which seems to me a sensible first step, make sure your Django memory usage plus the memory you allocate for caching, are not enough to go near the limits of the instance free memory.
CPU is hardly going to be an issue as I said since Memcached / Redis are pretty good at using little CPU, so I can't foresee a setup where Django is ok serving pages but the instance is in trouble since the CPU is burned by the cache.

How to profile Django's bottlenecks for scaling?

I am using django and tastypie for REST API.
For profiling, I am using django-silk and below is a summary of requests:
How do I profile the complete flow? Time taken except for database queries is (382 - 147) ms on average. How do I figure out the bottleneck and optimize/scale? I did use #silk_profile() for the get_object_list method for this resource, but even this method doesn't seem to be bottleneck.
I used caching for decreasing response time, but that didn't help much, what are the other options?
When testing using loader.io, the peak the server can handle is 1000 requests per 30 secs (which seems very low). Other than caching (which I already tried) what might help?
Here's a bunch of suggestions:
bring the query per request at least below 5 per request (34 per request is really bad)
install django toolbar and have a look where the time is spent
use gunicorn or uwsgi behind a reverse proxy (NGINX)
You have too much queries, even if they are relatively fast you spend
some time to reach database etc. Also if you have external cache
storage (for example, redis) it could take some time to connect
there.
To investigate slow parts of the code you have two options:
Use a profiler - profiling at local PC could make no sense if you have distributed system deployed to several machines
Add tracing points to your code that will record some message and current time (something like https://gist.github.com/dbf256/0f1d5d7d2c9aa70bce89). Deploy this patched code and test it with your load-testing tool and check logs.