We are working on a new REST API that will be deployed on AWS ElasticBeanstalk using Docker. It uses Python Celery for scheduled jobs which means separate process need to run for workers, our current Docker configuration has three containers...
Multicontainer Docker:
09c3182122f7 sso "gunicorn --reload --" 18 hours ago Up 26 seconds sso-api
f627c5391ee8 sso "celery -A sso worker" 18 hours ago Up 27 seconds sso-worker
f627c5391ee8 sso "celery beat -A sso -" 18 hours ago Up 27 seconds sso-beat
Conventional wisdom would suggest we should use a Multi-container configuration on ElasticBeanstalk but since all containers use the same code, using a single container configuration with Supervisord to manage processes might be more efficient and simpler from an OPS point of view.
Single Container w/ Supervisord:
[program:api]
command=gunicorn --reload --bind 0.0.0.0:80 --pythonpath '/var/sso' sso.wsgi:application
directory=/var/sso
[program:worker]
command=celery -A sso worker -l info
directory=/var/sso
numprocs=2
[program:beat]
command=celery beat -A sso -S djcelery.schedulers.DatabaseScheduler
directory=/var/sso
When setting up a multi-container configuration on AWS memory is allocated to each container, my thinking is it more efficient to let the container OS handle memory allocation internally rather than to explicitly set it to each container. I do not know enough about how Multi-container Docker runs under the hood on ElasticBeanstalk to intelligently recommend one way or the other.
What is the optimal configuration for this situation?
Related
I have an AWS beanstalk env and have old setting of wsgi (given below), I do not have idea how does this work internally, can anybody guide me?
NumProcesses:7 -- number of process
NumThreads:5 -- number of thread in each process
How memory and cpu are being used with this configuration because there is no memory and cpu settings in AWS beanstalk level.
These parameters are part of configuration option for Python environment:
aws:elasticbeanstalk:application:environment.
They mean (from docs):
NumProcesses: The number of daemon processes that should be started for the process group when running WSGI applications (default value 1).
NumThreads: The number of threads to be created to handle requests in each daemon process within the process group when running WSGI applications (default value 15).
Internally, these values map to uwsgi or gunicorn configuration options in your EB environment. For example:
uwsgi --http :8000 --wsgi-file application.py --master --processes 4 --threads 2
Their impact on memory and cpu usage of your instance(s) is based on your application and how resource intensive it is. If you are not sure how to set them up, maybe keeping them at default values would be a good start.
The settings are also available in the EB console, under Software category:
To add on to #Marcin
Amazon linux 2 uses gunicorn
workers are processes in gunicorn
Gunicorn should only need 4-12 worker processes to handle hundreds or thousands of requests per second.
Gunicorn relies on the operating system to provide all of the load balancing when handling requests. Generally, we (gunicorn creators) recommend (2 x $num_cores) + 1 as the number of workers to start off with. While not overly scientific, the formula is based on the assumption that for a given core, one worker will be reading or writing from the socket while the other worker is processing a request.
To see how the settings in the option settings map to gunicorn you can ssh into your eb instance, go
$ eb ssh
$ cd cd /var/app/current/
$ cat Procfile
web: gunicorn --bind 127.0.0.1:8000 --workers=3 --threads=20 api.wsgi:application
--threads
A positive integer generally in the 2-4 x $(NUM_CORES) range. You’ll want to vary this a bit to find the best for your particular application’s work load.
The threads option only applies to gthread worker type. gunicons default worker class is sync, If you try to use the sync worker type and set the threads setting to more than 1, the gthread worker type will be used instead automatically
based on all the above I would personally choose
workers = (2 x $NUM_CORES ) + 1
threads = 4 x $NUM_CORES
for a t3.medum instance that has 2 cores that translates to
workers = 5
threads = 8
obviously, you need to tweak this for your use case, and treat these as defaults that could very well not be right for your particular application use case, read the refs below to see how to choose the right setup for you use case
References:
REF: Gunicorn Workers and Threads
REF: https://medium.com/building-the-system/gunicorn-3-means-of-concurrency-efbb547674b7
REF: https://docs.gunicorn.org/en/stable/settings.html#worker-class
6 node docker swarm(cluster) - 3 mgrs, 3 workers
After running below command:
docker service create --name psight -p 8080:8080 --replicas 5 <image>
We see that, mgr3 does not run the task(shown below)
$ docker service ps psight1
ID NAME IMAGE NODE DESIRED_STATE CURRENT_STATE ERROR PORTS
yoj psight.1 image wrk2 Running Running 19 minutes ago
sjb psight.2 image wrk3 Running Running 19 minutes ago
vv6 psight.3 image mgr1 Running Running 19 minutes ago
scf psight.4 image mgr2 Running Running 19 minutes ago
7i2 psight.5 image wrk1 Running Running 19 minutes ago
but,
Can service be available from mgr3? with actual state(above)
As long as the mgr3 is reachable as a manager (ref. Monitor swarm health) then it should be able to perform the usual tasks of a manager.
If your instances are expose on the wide area network with a public IP, with ssh open to the world (e.g. 0.0.0.0/0, ::/0) and that you have you ssh key then you should be able to connect to the instance.
I am using custom Docker container in AWS Elastic beanstalk to deploy a wep application. I could set environment variables such as API key, API secret, etc.. on AWS web console.(Elastic beanstalk => Configuration => Software Configuration) It worked fine.
However, after I changed some variables to other values which contain special characters, it did not work with this error:
Docker container quit unexpectedly on Mon Oct 24 13:32:22 UTC 2016:
Error: Unexpected end of key/value pairs
For help, use /usr/bin/supervisord -h
Supervisord document says Values containing non-alphanumeric characters should be quoted.
My question is,
Are there any ways to make environment variables with special characters, which is set from AWS elastic beanstalk, properly reach to a docker container by quoting them?
I tried quoting them on web console:
Double quoting env vars on web console
But it did not work:
Escaped the double quotes
AWS Elastic beanstalk escapes double quotes by default, I guess.
For more info. my ebextension config file:
option_settings:
"aws:elasticbeanstalk:application:environment":
MY_VARIABLE: ""
MY_VARIABLE_QUOTED: ""
Supervisord config file:
[program:run_app]
environment=MY_VARIABLE=%(ENV_MY_VARIABLE)s,MY_VARIABLE_QUOTED=%(ENV_MY_VARIABLE_QUOTED)s
command=gunicorn my_app.wsgi:application -w 2 -b 0.0.0.0:8000 -t 300 --max-requests=100
directory=/var/www/my_app
user=root
stdout_logfile=/var/www/my_app/logs/django_stdout.log
stderr_logfile=/var/www/my_app/logs/django_stderr.log
autorestart=true
redirect_stderr=true
stopwaitsecs = 600
We're using Django + Gunicorn + Nginx in our server. The problem is that after a while we see lot's of gunicorn worker processes that have became orphan, and a lot other ones that have became zombie. Also we can see that some of Gunicorn worker processes spawn some other Gunicorn workers. Our best guess is that these workers become orphans after their parent workers have died.
Why Gunicorn workers spawn child workers? Why do they die?! And how can we prevent this?
I should also mention that we've set Gunicorn log level to debug and still we don't see any thing significant, other than periodical log of workers number, which reports count of workers we wanted from it.
UPDATE
This is the line we used to run gunicorn:
gunicorn --env DJANGO_SETTINGS_MODULE=proj.settings proj.wsgi --name proj --workers 10 --user proj --group proj --bind 127.0.0.1:7003 --log-level=debug --pid gunicorn.pid --timeout 600 --access-logfile /home/proj/access.log --error-logfile /home/proj/error.log
In my case I deploy in Ubuntu servers (LTS releases, now almost are 14.04 LTS servers) and I never did have problems with gunicorn daemons, I create a gunicorn.conf.py and launch gunicorn with this config from upstart with an script like this in /etc/init/djangoapp.conf
description "djangoapp website"
start on startup
stop on shutdown
respawn
respawn limit 10 5
script
cd /home/web/djangoapp
exec /home/web/djangoapp/bin/gunicorn -c gunicorn.conf.py -u web -g web djangoapp.wsgi
end script
I configure gunicorn with a .py file config and i setup some options (details below) and deploy my app (with virtualenv) in /home/web/djangoapp and no problems with zombie and orphans gunicorn processes.
i verified your options, timeout can be a problem but another one is that you don't setup max-requests in your config, by default is 0, so, no automatic worker restart in your daemon and can generate memory leaks (http://gunicorn-docs.readthedocs.org/en/latest/settings.html#max-requests)
We will use a .sh file to start the gunicorn process. Later you will use a supervisord configuration file. what is supervisord? some external know how information link about how to install supervisord with Django,Nginx,Gunicorn Here
gunicorn_start.sh remember to give chmod +x to the file.
#!/bin/sh
NAME="myDjango"
DJANGODIR="/var/www/html/myDjango"
NUM_WORKERS=3
echo "Starting myDjango -- Django Application"
cd $DJANGODIR
exec gunicorn -w $NUM_WORKERS $NAME.wsgi:application --bind 127.0.0.1:8001
mydjango_django.conf : Remember to install supervisord on your OS. and
Copy this on the configuration folder.
[program:myDjango]
command=/var/www/html/myDjango/gunicorn_start.sh
user=root
autorestart=true
redirect_sderr=true
Later on use the command:
Reload the daemon’s configuration files, without add/remove (no restarts)
supervisordctl reread
Restart all processes Note: restart does not reread config files. For that, see reread and update.
supervisordctl start all
Get all process status info.
supervisordctl status
This sounds like a timeout issue.
You have multiple timeouts going on and they all need to be in a descending order. It seems they may not be.
For example:
Nginx has a default timeout of 60 seconds
Gunicorn has a default timeout of 30 seconds
Django has a default timeout of 300 seconds
Postgres default timeout is complicated but let's pose 60 seconds for this example.
In this example, when 30 seconds has passed and Django is still waiting for Postgres to respond. Gunicorn tells Django to stop, which in turn should tell Postgres to stop. Gunicorn will wait a certain amount of time for this to happen before it kills django, leaving the postgres process as an orphan query. The user will re-initiate their query and this time the query will take longer because the old one is still running.
I see that you have set your Gunicorn tiemeout to 300 seconds.
This would probably mean that Nginx tells Gunicorn to stop after 60 seconds, Gunicorn may wait for Django who waits for Postgres or any other underlying processes, and when Nginx gets tired of waiting, it kills Gunicorn, leaving Django hanging.
This is still just a theory, but it is a very common problem and hopefully leads you and any others experiencing similar problems, to the right place.
I have a Django app deployed to Heroku, with a worker process running celery (+ celerycam for monitoring). I am using RedisToGo's Redis database as a broker. I noticed that Redis keeps running out of memory.
This is what my procfile looks like:
web: python app/manage.py run_gunicorn -b "0.0.0.0:$PORT" -w 3
worker: python lipo/manage.py celerycam & python app/manage.py celeryd -E -B --loglevel=INFO
Here's the output of KEYS '*':
"_kombu.binding.celeryd.pidbox"
"celeryev.643a99be-74e8-44e1-8c67-fdd9891a5326"
"celeryev.f7a1d511-448b-42ad-9e51-52baee60e977"
"_kombu.binding.celeryev"
"celeryev.d4bd2c8d-57ea-4058-8597-e48f874698ca"
`_kombu.binding.celery"
celeryev.643a99be-74e8-44e1-8c67-fdd9891a5326 is getting filled up with these messages:
{"sw_sys": "Linux", "clock": 1, "timestamp": 1325914922.206671, "hostname": "064d9ffe-94a3-4a4e-b0c2-be9a85880c74", "type": "worker-online", "sw_ident": "celeryd", "sw_ver": "2.4.5"}
Any idea what I can do to purge these messages periodically?
Is that a solution?
in addition to _kombu.bindings.celeryev set there will be e.g. celeryev.i-am-alive. keys with TTL set (e.g. 30sec);
celeryev process adds itself to bindings and periodically (e.g. every 5 sec) updates the celeryev.i-am-alive. key to reset the TTL;
before sending the event worker process checks not only smembers on _kombu.bindings.celeryev but the individual celeryev.i-am-alive. keys as well and if key is not found (expired) then it gets removed from _kombu.bindings.celeryev (and maybe the del celeryev. or expire celeryev. commands are executed).
we can't just use keys command because it is O(N) where N is the total number of keys in DB. TTLs can be tricky on redis < 2.1 though.
expire celeryev. instead of del celeryev. can be used in order to allow temporary offline celeryev consumer to revive, but I don't know if it worths it.
author