Wsgi number of process and threads setting in AWS Beanstalk - amazon-web-services

I have an AWS beanstalk env and have old setting of wsgi (given below), I do not have idea how does this work internally, can anybody guide me?
NumProcesses:7 -- number of process
NumThreads:5 -- number of thread in each process
How memory and cpu are being used with this configuration because there is no memory and cpu settings in AWS beanstalk level.

These parameters are part of configuration option for Python environment:
aws:elasticbeanstalk:application:environment.
They mean (from docs):
NumProcesses: The number of daemon processes that should be started for the process group when running WSGI applications (default value 1).
NumThreads: The number of threads to be created to handle requests in each daemon process within the process group when running WSGI applications (default value 15).
Internally, these values map to uwsgi or gunicorn configuration options in your EB environment. For example:
uwsgi --http :8000 --wsgi-file application.py --master --processes 4 --threads 2
Their impact on memory and cpu usage of your instance(s) is based on your application and how resource intensive it is. If you are not sure how to set them up, maybe keeping them at default values would be a good start.
The settings are also available in the EB console, under Software category:

To add on to #Marcin
Amazon linux 2 uses gunicorn
workers are processes in gunicorn
Gunicorn should only need 4-12 worker processes to handle hundreds or thousands of requests per second.
Gunicorn relies on the operating system to provide all of the load balancing when handling requests. Generally, we (gunicorn creators) recommend (2 x $num_cores) + 1 as the number of workers to start off with. While not overly scientific, the formula is based on the assumption that for a given core, one worker will be reading or writing from the socket while the other worker is processing a request.
To see how the settings in the option settings map to gunicorn you can ssh into your eb instance, go
$ eb ssh
$ cd cd /var/app/current/
$ cat Procfile
web: gunicorn --bind 127.0.0.1:8000 --workers=3 --threads=20 api.wsgi:application
--threads
A positive integer generally in the 2-4 x $(NUM_CORES) range. You’ll want to vary this a bit to find the best for your particular application’s work load.
The threads option only applies to gthread worker type. gunicons default worker class is sync, If you try to use the sync worker type and set the threads setting to more than 1, the gthread worker type will be used instead automatically
based on all the above I would personally choose
workers = (2 x $NUM_CORES ) + 1
threads = 4 x $NUM_CORES
for a t3.medum instance that has 2 cores that translates to
workers = 5
threads = 8
obviously, you need to tweak this for your use case, and treat these as defaults that could very well not be right for your particular application use case, read the refs below to see how to choose the right setup for you use case
References:
REF: Gunicorn Workers and Threads
REF: https://medium.com/building-the-system/gunicorn-3-means-of-concurrency-efbb547674b7
REF: https://docs.gunicorn.org/en/stable/settings.html#worker-class

Related

uWSGI downtime when restart

I have a problem with uwsgi everytime I restart the server when I have a code updates.
When I restart the uwsgi using "sudo restart accounting", there's a small gap between stop and start instance that results to downtime and stops all the current request.
When I try "sudo reload accounting", it works but my memory goes up (double). When I run the command "ps aux | grep accounting", it shows that I have 10 running processes (accounting.ini) instead of 5 and it freezes up my server when the memory hits the limit.
accounting.ini
I am running
Ubuntu 14.04
Django 1.9
nginx 1.4.6
uwsgi 2.0.12
This is how uwsgi does graceful reload. Keeps old processes until requests are served and creates new ones that will take over incoming requests.
Read Things that could go wrong
Do not forget, your workers/threads that are still running requests
could block the reload (for various reasons) for more seconds than
your proxy server could tolerate.
And this
Another important step of graceful reload is to avoid destroying
workers/threads that are still managing requests. Obviously requests
could be stuck, so you should have a timeout for running workers (in
uWSGI it is called the “worker’s mercy” and it has a default value of
60 seconds).
So i would recommend trying worker-reload-mercy
Default value is to wait 60 seconds, just lower it to something that your server can handle.
Tell me if it worked.
Uwsgi chain reload
This is another try to fix your issue. As you mentioned your uwsgi workers are restarting in a manner described below:
send SIGHUP signal to the master
Wait for running workers.
Close all of the file descriptors except the ones mapped to sockets.
Call exec() on itself.
One of the cons of this kind of reload might be stuck workers.
Additionaly you report that your server crashes when uwsgi maintains 10 proceses (5 old and 5 new ones).
I propose trying chain reload. DIrect quote from documentation explains this kind of reload best:
When triggered, it will restart one worker at time, and the following worker is not reloaded until the previous one is ready to accept new requests.
It means that you will not have 10 processes on your server but only 5.
Config that should work:
# your .ini file
lazy-apps = true
touch-chain-reload = /path/to/reloadFile
Some resources on chain reload and other kinds are in links below:
Chain reloading uwsgi docs
uWSGI graceful Python code deploy

ElasticBeanstalk Docker, one Container or multiple Containers?

We are working on a new REST API that will be deployed on AWS ElasticBeanstalk using Docker. It uses Python Celery for scheduled jobs which means separate process need to run for workers, our current Docker configuration has three containers...
Multicontainer Docker:
09c3182122f7 sso "gunicorn --reload --" 18 hours ago Up 26 seconds sso-api
f627c5391ee8 sso "celery -A sso worker" 18 hours ago Up 27 seconds sso-worker
f627c5391ee8 sso "celery beat -A sso -" 18 hours ago Up 27 seconds sso-beat
Conventional wisdom would suggest we should use a Multi-container configuration on ElasticBeanstalk but since all containers use the same code, using a single container configuration with Supervisord to manage processes might be more efficient and simpler from an OPS point of view.
Single Container w/ Supervisord:
[program:api]
command=gunicorn --reload --bind 0.0.0.0:80 --pythonpath '/var/sso' sso.wsgi:application
directory=/var/sso
[program:worker]
command=celery -A sso worker -l info
directory=/var/sso
numprocs=2
[program:beat]
command=celery beat -A sso -S djcelery.schedulers.DatabaseScheduler
directory=/var/sso
When setting up a multi-container configuration on AWS memory is allocated to each container, my thinking is it more efficient to let the container OS handle memory allocation internally rather than to explicitly set it to each container. I do not know enough about how Multi-container Docker runs under the hood on ElasticBeanstalk to intelligently recommend one way or the other.
What is the optimal configuration for this situation?

EC2, 16.04, Systemd, Supervisord, & Python

I have a service written in python 2.7 and managed by supervisord on an Ubuntu 16.04 EC2 spot instance.
On system startup I have a number of systemd tasks that need to take place and finish prior to supervisord starting the service.
When the instance is about to shutdown, I need supervisord to capture the event and tell the service to gracefully halt. The service will need to stop processing and return any workloads to the queue prior to exiting gracefully.
What would be the optimal way to manage system startup in this scenario?
What would be the optimal way to manage system shutdown in this scenario?
How do I best handle the interaction between supervisord and the service?
First, we need to install a systemd task that we want to run prior to supervisor starting up. Let's create a script, /usr/bin/pre-supervisor.sh, that will handle performing that work for us and create the /lib/systemd/system/pre-supervisor.service for systemd.
[Unit]
Description=Task to run prior to supervisor Starting up
After=cloud-init.service
Before=supervisor.service
Requires=cloud-init.service
[Service]
Type=oneshot
WorkingDirectory=/usr/bin
ExecStart=/usr/bin/pre-supervisor.sh
RemainAfterExit=no
TimeoutSec=90
User=ubuntu
# Output needs to appear in instance console output
StandardOutput=journal+console
[Install]
WantedBy=multi-user.target
As you can see, this will run after the ec2 cloud-init.service completes, and prior to the supervisor.service.
Next, let us modify the /lib/systemd/system/supervisor.service to run After the pres-supervisor.service completes, instead of after network.target.
[Unit]
Description=Supervisor process control system for UNIX
Documentation=http://supervisord.org
After=pre-supervisor.service
[Service]
ExecStart=/usr/bin/supervisord -n -c /etc/supervisor/supervisord.conf
ExecStop=/usr/bin/supervisorctl $OPTIONS shutdown
ExecReload=/usr/bin/supervisorctl -c /etc/supervisor/supervisord.conf $OPTIONS reload
KillMode=process
Restart=on-failure
RestartSec=50s
[Install]
WantedBy=multi-user.target
That will ensure that our pre-supervisor tasks run prior to supervisor starting up.
Because these are spot instances, AWS has exposed the termination notice in the meta-data url, I simply need to inject something like:
if requests.get("http://169.254.169.254/latest/meta-data/spot/termination-time").status_code == 200
into my python service, have it check every five seconds or so, and gracefully shutdown as soon as the termination notice appears.

Gunicorn sync workers spawning processes

We're using Django + Gunicorn + Nginx in our server. The problem is that after a while we see lot's of gunicorn worker processes that have became orphan, and a lot other ones that have became zombie. Also we can see that some of Gunicorn worker processes spawn some other Gunicorn workers. Our best guess is that these workers become orphans after their parent workers have died.
Why Gunicorn workers spawn child workers? Why do they die?! And how can we prevent this?
I should also mention that we've set Gunicorn log level to debug and still we don't see any thing significant, other than periodical log of workers number, which reports count of workers we wanted from it.
UPDATE
This is the line we used to run gunicorn:
gunicorn --env DJANGO_SETTINGS_MODULE=proj.settings proj.wsgi --name proj --workers 10 --user proj --group proj --bind 127.0.0.1:7003 --log-level=debug --pid gunicorn.pid --timeout 600 --access-logfile /home/proj/access.log --error-logfile /home/proj/error.log
In my case I deploy in Ubuntu servers (LTS releases, now almost are 14.04 LTS servers) and I never did have problems with gunicorn daemons, I create a gunicorn.conf.py and launch gunicorn with this config from upstart with an script like this in /etc/init/djangoapp.conf
description "djangoapp website"
start on startup
stop on shutdown
respawn
respawn limit 10 5
script
cd /home/web/djangoapp
exec /home/web/djangoapp/bin/gunicorn -c gunicorn.conf.py -u web -g web djangoapp.wsgi
end script
I configure gunicorn with a .py file config and i setup some options (details below) and deploy my app (with virtualenv) in /home/web/djangoapp and no problems with zombie and orphans gunicorn processes.
i verified your options, timeout can be a problem but another one is that you don't setup max-requests in your config, by default is 0, so, no automatic worker restart in your daemon and can generate memory leaks (http://gunicorn-docs.readthedocs.org/en/latest/settings.html#max-requests)
We will use a .sh file to start the gunicorn process. Later you will use a supervisord configuration file. what is supervisord? some external know how information link about how to install supervisord with Django,Nginx,Gunicorn Here
gunicorn_start.sh remember to give chmod +x to the file.
#!/bin/sh
NAME="myDjango"
DJANGODIR="/var/www/html/myDjango"
NUM_WORKERS=3
echo "Starting myDjango -- Django Application"
cd $DJANGODIR
exec gunicorn -w $NUM_WORKERS $NAME.wsgi:application --bind 127.0.0.1:8001
mydjango_django.conf : Remember to install supervisord on your OS. and
Copy this on the configuration folder.
[program:myDjango]
command=/var/www/html/myDjango/gunicorn_start.sh
user=root
autorestart=true
redirect_sderr=true
Later on use the command:
Reload the daemon’s configuration files, without add/remove (no restarts)
supervisordctl reread
Restart all processes Note: restart does not reread config files. For that, see reread and update.
supervisordctl start all
Get all process status info.
supervisordctl status
This sounds like a timeout issue.
You have multiple timeouts going on and they all need to be in a descending order. It seems they may not be.
For example:
Nginx has a default timeout of 60 seconds
Gunicorn has a default timeout of 30 seconds
Django has a default timeout of 300 seconds
Postgres default timeout is complicated but let's pose 60 seconds for this example.
In this example, when 30 seconds has passed and Django is still waiting for Postgres to respond. Gunicorn tells Django to stop, which in turn should tell Postgres to stop. Gunicorn will wait a certain amount of time for this to happen before it kills django, leaving the postgres process as an orphan query. The user will re-initiate their query and this time the query will take longer because the old one is still running.
I see that you have set your Gunicorn tiemeout to 300 seconds.
This would probably mean that Nginx tells Gunicorn to stop after 60 seconds, Gunicorn may wait for Django who waits for Postgres or any other underlying processes, and when Nginx gets tired of waiting, it kills Gunicorn, leaving Django hanging.
This is still just a theory, but it is a very common problem and hopefully leads you and any others experiencing similar problems, to the right place.

Celery and Redis keep running out of memory

I have a Django app deployed to Heroku, with a worker process running celery (+ celerycam for monitoring). I am using RedisToGo's Redis database as a broker. I noticed that Redis keeps running out of memory.
This is what my procfile looks like:
web: python app/manage.py run_gunicorn -b "0.0.0.0:$PORT" -w 3
worker: python lipo/manage.py celerycam & python app/manage.py celeryd -E -B --loglevel=INFO
Here's the output of KEYS '*':
"_kombu.binding.celeryd.pidbox"
"celeryev.643a99be-74e8-44e1-8c67-fdd9891a5326"
"celeryev.f7a1d511-448b-42ad-9e51-52baee60e977"
"_kombu.binding.celeryev"
"celeryev.d4bd2c8d-57ea-4058-8597-e48f874698ca"
`_kombu.binding.celery"
celeryev.643a99be-74e8-44e1-8c67-fdd9891a5326 is getting filled up with these messages:
{"sw_sys": "Linux", "clock": 1, "timestamp": 1325914922.206671, "hostname": "064d9ffe-94a3-4a4e-b0c2-be9a85880c74", "type": "worker-online", "sw_ident": "celeryd", "sw_ver": "2.4.5"}
Any idea what I can do to purge these messages periodically?
Is that a solution?
in addition to _kombu.bindings.celeryev set there will be e.g. celeryev.i-am-alive. keys with TTL set (e.g. 30sec);
celeryev process adds itself to bindings and periodically (e.g. every 5 sec) updates the celeryev.i-am-alive. key to reset the TTL;
before sending the event worker process checks not only smembers on _kombu.bindings.celeryev but the individual celeryev.i-am-alive. keys as well and if key is not found (expired) then it gets removed from _kombu.bindings.celeryev (and maybe the del celeryev. or expire celeryev. commands are executed).
we can't just use keys command because it is O(N) where N is the total number of keys in DB. TTLs can be tricky on redis < 2.1 though.
expire celeryev. instead of del celeryev. can be used in order to allow temporary offline celeryev consumer to revive, but I don't know if it worths it.
author