How to manage automatic restarts of multiple Clojure apps with Supervisord? - clojure

I do not have any experience with Supervisord. If I look here:
https://tgallant.github.io/clojure/supervisord-with-clojure.html
I see this example of managing a Clojure app with Supervisord:
[program:blog-checker]
command= /usr/local/bin/java -jar target/blog-checker-0.1.0-SNAPSHOT-standalone.jar
directory=/usr/local/www/blog-checker
autostart=true
autorestart=true
startretries=3
user=www
If I would like to use Supervisord to keep 3 instances of my app running, do I create 3 separate entries, or is there a way to have just this one entry, but tell it to keep 3 instances going?

Does your application listen on a port? If so, here's an example of how you would keep four instances of a networked application running (in this case, ApiAxle's API):
[program:apiaxle-api]
process_name = apiaxle-api-%(process_num)s
command = apiaxle-api -f 1 -p %(process_num)s
directory = /home/apiaxle/apiaxle/api
numprocs = 4
numprocs_start = 3000
user=apiaxle
redirect_stderr=true
stdout_logfile=/var/log/apiaxle/api-%(process_num)s-stdout.log
stderr_logfile=/var/log/apiaxle/api-%(process_num)s-stderr.log
Four instances listening on ports 3000, 3001, 3002 and 3003. $(process_num) is a variable that represents the port.

You could do it by specifying numprocs and process_name, e.g.:
numprocs = 3
process_name = %(program_name)s_%(process_num)02d
An unless you want to count your processes from zero, you should specify numprocs_start variable as well:
numprocs_start = 1
You could find complete supervisor documentation here.

Related

process 'forkPoolworker-5' pid:111 exited with 'signal 9 (SIGKILL)'

hello every one who helps me?
python:3.8
Django==4.0.4
celery==5.2.1
I am using python/Django/celery to do something,when I get data from hive by sql,my celely worker get this error "process 'forkPoolworker-5' pid:111 exited with 'signal 9 (SIGKILL)'",and then,my task is not be used to finish and the tcp connect is closing! what can I do for it to solve?
I try to do:
CELERYD_MAX_TASKS_PER_CHILD = 1 # 单work最多任务使用数
CELERYD_CONCURRENCY = 3 # 单worker最大并发数
CELERYD_MAX_MEMORY_PER_CHILD = 1024*1024*2 # 单任务可占用2G内存
CELERY_TASK_RESULT_EXPIRES = 60 * 60 * 24 * 3
-Ofair
but these is not using for solving.
SIGKILL is raised by system, most likely due to memory or storage, monitor how much memory a celery task takes by running -P solo option or -c 1 and allocate sufficient memory accordingly.
To check memory usage either use pmap <pid> or ps -a -o rss,vsz. Please search rss and vsz for more details (in short rss is RAM and vsz is virtual memory).
CELERYD_MAX_TASKS_PER_CHILD = 1 kills process after every task, so CELERYD_MAX_MEMORY_PER_CHILD has no affect ie worker waits for completion of task before enforcing limit on running child process.

multi process Daphne with supervisor, getting [Errno 88] Socket operation on non-socket

daphne and Django channels work fine on command line or single process. But when I start it with supervisor, the error occurs.
2020-02-18 12:40:35,995 CRITICAL Listen failure: [Errno 88] Socket operation on non-socket
My config file is
[program:asgi]
socket=tcp://localhost:9000
directory=/root/test/test/
command=daphne -u /run/daphne/daphne%(process_num)d.sock --endpoint fd:fileno=0 --access-log - --proxy-headers test.asgi:application
# Number of processes to startup, roughly the number of CPUs you have
numprocs=2
# Give each process a unique name so they can be told apart
process_name=asgi%(process_num)d
# Automatically start and recover processes
autostart=true
autorestart=true
# Choose where you want your log to go
stdout_logfile=/root/test/test/script/asgi.log
redirect_stderr=true
[supervisord]
[supervisorctl]
Any ideas? Thanks!
There seemed to be a bug with either Python or Twisted unable to bind to the file descriptor 0 as referenced here: https://github.com/django/daphne/issues/263
Try to bind the socket to another fd, say 10 like so:
daphne -u /run/daphne/daphne%(process_num)d.sock --fd 10 --access-log - --proxy-headers test.asgi:application
If that doesn't work, try checking whether you have read and write access to the /run/daphne/ folder itself
First line - [program:asgi]
Must be like this: [fcgi-program:asgi]
If you carefully study the documentation, it is obvious that the first line of the config is incorrect.
https://channels.readthedocs.io/en/stable/deploying.html#example-setups
Look carefully at the given config in the example`

Unable to get CAP_CHOWN and CAP_DAC_OVERRIDE working for regular user

My requirement
My python server runs as a regular user on RHEL
But it needs to create files/directories at places it doesn't have access to.
Also needs to do chown those files with random UID/GID
My approach
Trying this in capability-only environment, no setuid.
I am trying to make use of cap_chown and cap_dac_override capabilities.
But am totally lost of how to get it working in systemctl kind of environment
At present I have following in the service file:
#cat /usr/lib/systemd/system/my_server.service
[Service]
Type=simple
SecureBits=keep-caps
User=testuser
CapabilityBoundingSet=~
Capabilities=cap_dac_override,cap_chown=eip
ExecStart=/usr/bin/linux_capability_test.py
And following on the binary itself:
# getcap /usr/bin/linux_capability_test.py
/usr/bin/linux_capability_test.py = cap_chown,cap_dac_override+ei
But this here says, that it will never work on scripts:
Is there a way for non-root processes to bind to "privileged" ports on Linux?
With the current setting, the capabilities I have for the running process are:
# ps -ef | grep lin
testuser 28268 1 0 22:31 ? 00:00:00 python /usr/bin/linux_capability_test.py
# getpcaps 28268
Capabilities for `28268': = cap_chown,cap_dac_override+i
But if I try to create file in /etc/ from within that script:
try:
file_name = '/etc/junk'
with open(file_name, 'w') as f:
os.utime(file_name,None)
It fails with 'Permission denied'
Is that the same case for me that it won't work ?
Can I use python-prctl module here to get it working ?
setuid will not work with scripts because it is a security hole, due to the way that scripts execute. There are several documents on this. You can even start by looking at the wikipedia page.
A really good workaround is to write a small C program that will launch your Python script with hard-coded paths to python and the script. A really good discussion of all the issues may be found here
Update: A method to do this, not sure if the best one. Using 'python-prctl' module:
1. Ditch 'User=testuser' from my-server.service
2. Start server as root
3. Set 'keep_caps' flag True
4. Do 'setgroups, setgid and setuid'
5. And immediately limit the permitted capability set to 'DAC_OVERRIDE' and 'CHOWN' capability only
6. Set the effective capability for both to True
Here is the code for the same
import prctl
prctl.securebits.keep_caps = True
os.setgroups([160])
os.setgid(160)
os.setuid(160)
prctl.cap_permitted.limit(prctl.CAP_CHOWN, prctl.CAP_DAC_OVERRIDE)
prctl.cap_effective.dac_override = True
prctl.cap_effective.chown = True`
DONE !!
Based upon our discussion above, I did the following:
[Service]
Type=simple
User=testuser
SecureBits=keep-caps
Capabilities=cap_chown,cap_dac_override=i
ExecStart=/usr/bin/linux_capability_test.py
This starts the server with both those capabilities as inheritable.
Wrote a small C, test code to chown file
#include <unistd.h>
int main()
{
int ret = 0;
ret = chown("/etc/junk", 160, 160);
return ret;
}
Set following on the gcc'ed binary
chown testuser:testuser /usr/bin/chown_c
chmod 550 /usr/bin/chown_c
setcap cap_chown,cap_dac_override=ie /usr/bin/chown_c
The server does following to call the binary
import prctl
prctl.cap_inheritable.chown = True
prctl.cap_inheritable.dac_override = True
execve('/usr/bin/chown_c',[],os.environ)
And I was able to get the desired result
# ll /etc/junk
-rw-r--r-- 1 root root 0 Aug 8 22:33 /etc/junk
# python capability_client.py
# ll /etc/junk
-rw-r--r-- 1 testuser testuser 0 Aug 8 22:33 /etc/junk

django+uwsgi huge excessive memory usage issue

I have a django+uwsgi based website. some of the tables have almost 1 million rows.
After a few website usages, the VIRT memory used the uwsgi process reaches almost 20GB...almost kill my server...
Could you someone tell what may caused this problem? is it my table rows too big? (unlikely. Pinterest has much more data). now, i had to use reload-on-as= 10024
reload-on-rss= 4800 to kill the workers every few minutes....it is painful...
any help?
Here is my uwsgi.ini file
[uwsgi]
chdir = xxx
module = xxx.wsgi
master = true
processes = 2
socket =127.0.0.1:8004
chmod-socket = 664
no-orphans = true
#limit-as=256
reload-on-as= 10024
reload-on-rss= 4800
max-requests=250
uid = www-data
gid = www-data
#chmod-socket = 777
chown-socket = www-data
# clear environment on exit
vacuum = true
After some digging on stackflow and google search, here is the solution.
read this how django memory works and why it keeps going up
read this django app profiling
then I figured out the major parameter to set in uwsgi.ini is max_request. originally, I set it as 2000. now set it as 50. so it will respawn workers when memory goes up too much.
Then i try to figure out which request causes huge data query results from database. I ended up finding this little line:
amount=sum(x.amount for x in Project.objects.all())
While Project table has over 1 million complex entries.Occupying huge memory.... since I commented this out... everything runs smooth now.
So it is good to understand how the [django query works with database]
(Sorry I don't have enough reputation to comment - so apologies if this answer doesn't help in your case)
I had the same issue running Django on uwsgi/gninx and uwsgi controlled via supervisor. uwsgi-supervisor process started using lots of memory and consuming 100% CPU so only option was to repeatedly restart uwsgi.
Turned out the solution was to set up logging in the uwsgi.ini file:
logto = /var/log/uwsgi.log
There is some discussion on this here: https://github.com/unbit/uwsgi/issues/296

Celery - run different workers on one server

I have 2 kind of tasks :
Type1 - A few of high priority small tasks.
Type2 - Lot of heavy tasks with lower priority.
Initially i had simple configuration with default routing, no routing keys were used. It was not sufficient - sometimes all workers were busy with Type2 Tasks, so Task1 were delayed.
I've added routing keys:
CELERY_DEFAULT_QUEUE = "default"
CELERY_QUEUES = {
"default": {
"binding_key": "task.#",
},
"highs": {
"binding_key": "starter.#",
},
}
CELERY_DEFAULT_EXCHANGE = "tasks"
CELERY_DEFAULT_EXCHANGE_TYPE = "topic"
CELERY_DEFAULT_ROUTING_KEY = "task.default"
CELERY_ROUTES = {
"search.starter.start": {
"queue": "highs",
"routing_key": "starter.starter",
},
}
So now i have 2 queues - with high and low priority tasks.
Problem is - how to start 2 celeryd's with different concurrency settings?
Previously celery was used in daemon mode(according to this), so only start of /etc/init.d/celeryd start was required, but now i have to run 2 different celeryds with different queues and concurrency. How can i do it?
Based on the above answer, I formulated the following /etc/default/celeryd file (originally based on the configuration described in the docs here: http://ask.github.com/celery/cookbook/daemonizing.html) which works for running two celery workers on the same machine, each worker servicing a different queue (in this case the queue names are "default" and "important").
Basically this answer is just an extension of the previous answer in that it simply shows how to do the same thing, but for celery in daemon mode. Please note that we are using django-celery here:
CELERYD_NODES="w1 w2"
# Where to chdir at start.
CELERYD_CHDIR="/home/peedee/projects/myproject/myproject"
# Python interpreter from environment.
#ENV_PYTHON="$CELERYD_CHDIR/env/bin/python"
ENV_PYTHON="/home/peedee/projects/myproject/myproject-env/bin/python"
# How to call "manage.py celeryd_multi"
CELERYD_MULTI="$ENV_PYTHON $CELERYD_CHDIR/manage.py celeryd_multi"
# How to call "manage.py celeryctl"
CELERYCTL="$ENV_PYTHON $CELERYD_CHDIR/manage.py celeryctl"
# Extra arguments to celeryd
# Longest task: 10 hrs (as of writing this, the UpdateQuanitites task takes 5.5 hrs)
CELERYD_OPTS="-Q:w1 default -c:w1 2 -Q:w2 important -c:w2 2 --time-limit=36000 -E"
# Name of the celery config module.
CELERY_CONFIG_MODULE="celeryconfig"
# %n will be replaced with the nodename.
CELERYD_LOG_FILE="/var/log/celery/celeryd.log"
CELERYD_PID_FILE="/var/run/celery/%n.pid"
# Name of the projects settings module.
export DJANGO_SETTINGS_MODULE="settings"
# celerycam configuration
CELERYEV_CAM="djcelery.snapshot.Camera"
CELERYEV="$ENV_PYTHON $CELERYD_CHDIR/manage.py celerycam"
CELERYEV_LOG_FILE="/var/log/celery/celerycam.log"
# Where to chdir at start.
CELERYBEAT_CHDIR="/home/peedee/projects/cottonon/cottonon"
# Path to celerybeat
CELERYBEAT="$ENV_PYTHON $CELERYBEAT_CHDIR/manage.py celerybeat"
# Extra arguments to celerybeat. This is a file that will get
# created for scheduled tasks. It's generated automatically
# when Celerybeat starts.
CELERYBEAT_OPTS="--schedule=/var/run/celerybeat-schedule"
# Log level. Can be one of DEBUG, INFO, WARNING, ERROR or CRITICAL.
CELERYBEAT_LOG_LEVEL="INFO"
# Log file locations
CELERYBEAT_LOGFILE="/var/log/celerybeat.log"
CELERYBEAT_PIDFILE="/var/run/celerybeat.pid"
It seems answer - celery-multi - is currently not documented well.
What I needed can be done by the following command:
celeryd-multi start 2 -Q:1 default -Q:2 starters -c:1 5 -c:2 3 --loglevel=INFO --pidfile=/var/run/celery/${USER}%n.pid --logfile=/var/log/celeryd.${USER}%n.log
What we do is starting 2 workers, which are listening to different queues (-Q:1 is default, Q:2 is starters ) with different concurrencies -c:1 5 -c:2 3
Another alternative is to give the worker process a unique name -- using the -n argument.
I have two Pyramid apps running on the same physical hardware, each with its own celery instance(within their own virtualenvs).
They both have Supervisor controlling both of them, both with a unique supervisord.conf file.
app1:
[program:celery]
autorestart=true
command=%(here)s/../bin/celery worker -n ${HOST}.app1--app=app1.queue -l debug
directory=%(here)s
[2013-12-27 10:36:24,084: WARNING/MainProcess] celery#maz.local.app1 ready.
app2:
[program:celery]
autorestart=true
command=%(here)s/../bin/celery worker -n ${HOST}.app2 --app=app2.queue -l debug
directory=%(here)s
[2013-12-27 10:35:20,037: WARNING/MainProcess] celery#maz.local.app2 ready.
An update:
In Celery 4.x, below would work properly:
celery multi start 2 -Q:1 celery -Q:2 starters -A $proj_name
Or if you want to designate instance's name, you could:
celery multi start name1 name2 -Q:name1 celery -Q:name2 queue_name -A $proj_name
However, I find it would not print details logs on screen then if we use celery multi since it seems only a script shortcut to boot up these instances.
I guess it would also work if we start these instances one by one manually by giving them different node names but -A the same $proj_name though it's a bit of a wasting of time.
Btw, according to the official document, you could kill all celery workers simply by:
ps auxww | grep 'celery worker' | awk '{print $2}' | xargs kill -9