Memory Management in Django Applications - django

I'm running Django applications on Webfaction and AWS EC2 Micro Instance(613MB of RAM) servers. From past 2-3 months I'm facing memory over limit issues(currently only 4-5 users are using this application). Due to memory over limit, MySQL and APACHE processes are getting killed. I've taken the following steps to reduce memory consumption-
Removed ".all()" django queries.
Swap space of 1.5 GB.
Apache Configuration changed to:
StartServers 4
MinSpareServers 2
MaxSpareServers 4
MaxClients 7
ServerLimit 7
MaxRequestsPerChild 0
MySQL -> my.cnf changed to:
slow-query-log=1
max_connections=45
query_cache_size=16M
table_cache=128
tmp_table_size=32M
max_heap_table_size=33554432
Installed "Dozer" to find memory leaks(Not reporting any problem).
Somebody please let me know, what else can be done to reduce memory consumption.
Also let me know, how can I track the time, taken by a django filter query.

Do you already use django-debug-toolbar? It helps you with tracking long or unwanted queries – this is for local environment.
For hosted application, make sure you have DEBUG set to False. Django keeps track of all queries for infinity when debug mode is enabled.
If it doesn't help search for global/class attributes that holds big data structures and move those to cache/db.
Also make sure not to sort long lists in your views/forms or iterate through very long querysets as all goes to memory at instance. Try to do it in small batches instead.

Related

Django high memory usage

I'm using django as a backend for a React frontend, and deploying both applications on Heroku.
I also use Gunicorn do serve the application and signed the Hobby plan on Heroku which offers 512 MB of RAM for the application to run.
But the django dyno, is almost always using a lot of memory, and exceeding the 512 mb limit. It goes down to only 40 MB of usage whenever I restart or deploy changes, but as soon as any user uses the system and calls some queries. The memory goes up a lot.
I've read about django and django-rest-framework memory optimization for some days now, and tried some changes like: using --preload on Gunicorn, setting --max-requests to kill process when they're too heavy on memory, I've also set CONN_MAX_AGE for the database and WEB_CONCURRENCY as stated on:
https://devcenter.heroku.com/articles/python-concurrency-and-database-connections
But none of that gave me a good enough result. What I'm guessing is wrong now are my queries, because I've seen some articles about the usage of .iterator() on queries and how it prevents de queries from being cached by the application and I didn't use it in any of my queries.
I don't think caching the queries would help on my application at all, I even store some of the results on React state exactly to keep the queries from being called again.
I tried using .iterator() on some queries but I noticed that when the memory goes up on the container it stays up for a very long time. I saw consumption remain the same for up to 6 hours straight, and I don't think that a query cache would be maintained in memory for so long (or would it?).
So, now I'm a little confused about what to try next and on what I should focus and any help is welcome. Thanks in advance!
EDIT
Just attached an image which shows the memory usage going up 60 MB only because I called the logout function!! Makes no sense to me... Also after it goes up it takes a really, really long time to go back down again.
PRINT_FROM_HEROKU_LOGS
You have to use a memroy profiler to see what function or method allocate memory
An example tool is memray, after installing it, run the django server like this:
python -m memray run ./manage.py runserver
Visit the pages or call the APIs that might use a lot of memory then end the program run (on linux use CTRL+c)
It will generate a file with memory usage details and show you how to convert it to readable format, you can paste here to get some insights if you can't read it by yourself

Disabling Swapping of a Redis Instance on AWS's ElastiCache

We are trying to disable swapping RAM to the disk for a Redis instance managed by AWS's Elasticache - but couldn't find the right property to do so.
We also cannot find a way to SSH it and turn off swapping from the kernerl, can you please help ?
While not a direct answer to your question about disabling swapping, we've been struggling with Redis swaping on Elasticache as well. What we ended up doing to address swapping is the following:
Followed Leo's suggestion of setting reserved memory
Run a nightly batch job to SCAN all keys in batches of 10,000. The SCAN command will evict any expired keys. This helps by proactively cleaning up the cache before swapping kicks in.
Run another custom batch job which processes entities we know can be evicted. These are entities which aren't as important as others which are in the cache. We've setup the keys so they contain enough information to easily identify those associated to an entity. Use SCAN with a match to find the keys. Once you find them, call DEL on each. This batch job alone is saving lots of space in our Redis instance. Word of caution, avoid using the KEYS command as it is slow and will block other threads.
We've been using the above for a few weeks now and so far it has been working well. In a few more weeks we'll know how well it works since we have a default TTL of 30 days and the number of cached items is still increasing.
Good luck!
Update
We turned off the job which uses SCAN on all keys. We discovered it was causing swap to slowly creep up (roughly 500k every other day). Once we turned that off, swap started shrinking. The combination of setting reserved memory and flushing objects we know can be expired is working well. When redis starts running out of room, it evicts any expired cached objects to make room for new entries. The only impact we've noticed is a very small increase in CPU usage, which isn't causing any trouble.
I had a similar problem, where Elasticache(Redis) in AWS suddenly started using Swap space even while we use the Allkeys-LRU algorithm. The machine was not using swap while consuming the whole memory for the past few weeks until that changed one early morning.
I used the command
redis-cli -h elasticache.service-name memory DOCTOR
The output was -->
High allocator fragmentation: This instance has an allocator external fragmentation greater than 1.1. This problem is usually due
either to a large peak memory (check if there is a peak memory entry
above in the report) or may result from a workload that causes the
allocator to fragment memory a lot. You can try enabling
'activedefrag' config option.
checking with command
redis-cli -h elasticache.service-name memory STATS
I saw that the defragment value was high(1.4)
I looked onto the AWS console for Elasticache-Redis params and made the defragment setting to true as it was set as false.
It is not possible to connect to Elasticache via SSH.
Are you sure that you are having issues with Redis swapping to disk, or the host running out of memory and crashing (I've seen this happen with the default configuration)? If so, the guidance is to leave about 25% of the system memory available for host processes - http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/redis-memory-management.html

Sitecore publishing and lag of upto 30 seconds

We have noticed an interesting issue in our Sitecore install. Any auto publish or scheduled publish jobs takes a long time when compared to our other environments. Between each individual job there seems to be a lag of anywhere from 5 to 30 seconds. In our other environments we do not see any lags as difference between 2 publishing jobs in those environments is less than a second.
We have tried the following up until now -
We have already checked for differences between the problematic and
other environments and do not see any differences in configuration or
code.
The caches are pretty similar in all environments.
We tried enabling parallel publishing but that did not make much difference.
Indexing is also very quick in the problematic environment and finishes within one second for each job.
At this point, we are not sure what is causing this issue. Any suggestions would be helpful.
Thanks
As Sitecore would allow maximum one publish to be executed at once to avoid data corruption, I would assume you might add publish jobs faster than they are processed => queueing.
In order to make accurate conclusions, the publish operation needs to be profiled - that will give an answer on wallclock time spend (like ~80% on network + database operations, only 20% in Sitecore code).
You'll need to collect a few 20-second long profiles while observing publishing lag.
From there you'll see how the time is spent.
Please keep in mind that seeing obsolete content in the browser does not necessarily mean publishing is slow - there are 100500 caching layers in the middle that can influence.
Looks like I have similar issue.
Do have multiple IaaS Sitecore installations. 2 environments (hosted on onr VM) have much better performance (package installation, publish etc).
Also have 2 more Sitecore installations on other VM, and publish and package installation there is 4-5 times slower, than on first VM.
I've used the same Sitecore installation configuration, but with different prefix.
In my case i was migrating from Sitecore 8.2 to Sitecore 9.2. Used Unicorn to migrate data, and saw, that content publish (seems, writes to master) is slow right away.
So, on first 2 environments migration with unicorn, content publish and package installation was a way faster, but on 2 other this process is slower.

finding out why a webapp is slow when hosted

I have a django web app that uses postgres db.It allows users to login and make some posts which get saved to db and later on the user can list how many posts he made on a particular day etc and list the posts belonging to a particular category etc.While this worked without any delay in my machine,it is taking a lot of time to load each page when hosted on a free host.
How do you find out why this is happening?which part of the app should I look first?Is there any meaning in using a profiler since this app used to run with no delays in my local machine?
I would like to find out how to approach this problem in general.I was able to access other apps hosted on the same free host without much delays ..so this may be a problem specific to my app
I would like some advice regarding this..If anyone can help..
thank you
p.s:(I intentionally left out the host's name because ,since that was a free service ,there was no point in complaining and also other apps on the same host works well)
The here is the free host bit, when on a free host you could be sharing a box with hundreds of other sites (that can equate to a very small amount of ram or CPU). Pay a little money, ($30 dollars / £22 a year) and get your self a better host.
You will find the performance and reliability so much better.
Failing that I would firstly find out what the latency between you and the server is, on a local machine there is no / little network traffic so your pages will appear to load a lot faster.
Next i would look at the actual download speeds you are getting. It could be that your site is limited to 20-30k, which means even a small site will take over a second to load.
Are you hosting many images? If this is the case are you serving them through django or is the webserver doing this. If it is django then make the webserver take this load.
Finally check the processing speed of the pages. Analise the queries which are being run and find out what is taking the time. Make sure that postgres is correctly configured and has enough resources. You can analyses the query speed using the django debug toolbar.

Django-based app might grow out of control - how can I scale it?

I need some guidance in the realm of server architecture for Django.
My current Django-based web app stats (reached in two weeks - run on one VPS w/ Apache, mod_wsgi, mysql):
10,000 users total
20 avg requests/user/day
200,000 requests/day
8,000 users access site daily
Where the app could reach (where I'd be panicking - this assumes approx linear growth):
200,000 users total
20 avg requests/user/day
4,000,000 requests/day
160,000 users access site daily
The issue here is really just handling page requests. I only store short strings of text-based data, so DB size shouldn't be an issue.
What sort of server architecture should I be setting up from a hardware and software perspective? I need to think about caching, load balancing, multiple processing servers, multiple DB servers, etc, but don't know where to start.
You're projected growth of ~45 / requests per second really isn't that intensive. I think using a standard nginx load balancer in front of your web servers will handle everything. If your DB access isn't very intense you will probably do fine with just 1 DB machine.
I really think the most important thing is not to do any premature optimization. Deal with issues as they come, or else you may end up wasting a lot of time.
There are tons of caching, multiple server configurations, and load balancing tutorials.
Google is a good place to start.
Growing traffic is a standard problem, there are no lack of tutorials on these things.