Celery - run different workers on one server - django

I have 2 kind of tasks :
Type1 - A few of high priority small tasks.
Type2 - Lot of heavy tasks with lower priority.
Initially i had simple configuration with default routing, no routing keys were used. It was not sufficient - sometimes all workers were busy with Type2 Tasks, so Task1 were delayed.
I've added routing keys:
CELERY_DEFAULT_QUEUE = "default"
CELERY_QUEUES = {
"default": {
"binding_key": "task.#",
},
"highs": {
"binding_key": "starter.#",
},
}
CELERY_DEFAULT_EXCHANGE = "tasks"
CELERY_DEFAULT_EXCHANGE_TYPE = "topic"
CELERY_DEFAULT_ROUTING_KEY = "task.default"
CELERY_ROUTES = {
"search.starter.start": {
"queue": "highs",
"routing_key": "starter.starter",
},
}
So now i have 2 queues - with high and low priority tasks.
Problem is - how to start 2 celeryd's with different concurrency settings?
Previously celery was used in daemon mode(according to this), so only start of /etc/init.d/celeryd start was required, but now i have to run 2 different celeryds with different queues and concurrency. How can i do it?

Based on the above answer, I formulated the following /etc/default/celeryd file (originally based on the configuration described in the docs here: http://ask.github.com/celery/cookbook/daemonizing.html) which works for running two celery workers on the same machine, each worker servicing a different queue (in this case the queue names are "default" and "important").
Basically this answer is just an extension of the previous answer in that it simply shows how to do the same thing, but for celery in daemon mode. Please note that we are using django-celery here:
CELERYD_NODES="w1 w2"
# Where to chdir at start.
CELERYD_CHDIR="/home/peedee/projects/myproject/myproject"
# Python interpreter from environment.
#ENV_PYTHON="$CELERYD_CHDIR/env/bin/python"
ENV_PYTHON="/home/peedee/projects/myproject/myproject-env/bin/python"
# How to call "manage.py celeryd_multi"
CELERYD_MULTI="$ENV_PYTHON $CELERYD_CHDIR/manage.py celeryd_multi"
# How to call "manage.py celeryctl"
CELERYCTL="$ENV_PYTHON $CELERYD_CHDIR/manage.py celeryctl"
# Extra arguments to celeryd
# Longest task: 10 hrs (as of writing this, the UpdateQuanitites task takes 5.5 hrs)
CELERYD_OPTS="-Q:w1 default -c:w1 2 -Q:w2 important -c:w2 2 --time-limit=36000 -E"
# Name of the celery config module.
CELERY_CONFIG_MODULE="celeryconfig"
# %n will be replaced with the nodename.
CELERYD_LOG_FILE="/var/log/celery/celeryd.log"
CELERYD_PID_FILE="/var/run/celery/%n.pid"
# Name of the projects settings module.
export DJANGO_SETTINGS_MODULE="settings"
# celerycam configuration
CELERYEV_CAM="djcelery.snapshot.Camera"
CELERYEV="$ENV_PYTHON $CELERYD_CHDIR/manage.py celerycam"
CELERYEV_LOG_FILE="/var/log/celery/celerycam.log"
# Where to chdir at start.
CELERYBEAT_CHDIR="/home/peedee/projects/cottonon/cottonon"
# Path to celerybeat
CELERYBEAT="$ENV_PYTHON $CELERYBEAT_CHDIR/manage.py celerybeat"
# Extra arguments to celerybeat. This is a file that will get
# created for scheduled tasks. It's generated automatically
# when Celerybeat starts.
CELERYBEAT_OPTS="--schedule=/var/run/celerybeat-schedule"
# Log level. Can be one of DEBUG, INFO, WARNING, ERROR or CRITICAL.
CELERYBEAT_LOG_LEVEL="INFO"
# Log file locations
CELERYBEAT_LOGFILE="/var/log/celerybeat.log"
CELERYBEAT_PIDFILE="/var/run/celerybeat.pid"

It seems answer - celery-multi - is currently not documented well.
What I needed can be done by the following command:
celeryd-multi start 2 -Q:1 default -Q:2 starters -c:1 5 -c:2 3 --loglevel=INFO --pidfile=/var/run/celery/${USER}%n.pid --logfile=/var/log/celeryd.${USER}%n.log
What we do is starting 2 workers, which are listening to different queues (-Q:1 is default, Q:2 is starters ) with different concurrencies -c:1 5 -c:2 3

Another alternative is to give the worker process a unique name -- using the -n argument.
I have two Pyramid apps running on the same physical hardware, each with its own celery instance(within their own virtualenvs).
They both have Supervisor controlling both of them, both with a unique supervisord.conf file.
app1:
[program:celery]
autorestart=true
command=%(here)s/../bin/celery worker -n ${HOST}.app1--app=app1.queue -l debug
directory=%(here)s
[2013-12-27 10:36:24,084: WARNING/MainProcess] celery#maz.local.app1 ready.
app2:
[program:celery]
autorestart=true
command=%(here)s/../bin/celery worker -n ${HOST}.app2 --app=app2.queue -l debug
directory=%(here)s
[2013-12-27 10:35:20,037: WARNING/MainProcess] celery#maz.local.app2 ready.

An update:
In Celery 4.x, below would work properly:
celery multi start 2 -Q:1 celery -Q:2 starters -A $proj_name
Or if you want to designate instance's name, you could:
celery multi start name1 name2 -Q:name1 celery -Q:name2 queue_name -A $proj_name
However, I find it would not print details logs on screen then if we use celery multi since it seems only a script shortcut to boot up these instances.
I guess it would also work if we start these instances one by one manually by giving them different node names but -A the same $proj_name though it's a bit of a wasting of time.
Btw, according to the official document, you could kill all celery workers simply by:
ps auxww | grep 'celery worker' | awk '{print $2}' | xargs kill -9

Related

Why celery not executing parallelly in Django?

I am having a issue with the celery , I will explain with the code
def samplefunction(request):
print("This is a samplefunction")
a=5,b=6
myceleryfunction.delay(a,b)
return Response({msg:" process execution started"}
#celery_app.task(name="sample celery", base=something)
def myceleryfunction(a,b):
c = a+b
my_obj = MyModel()
my_obj.value = c
my_obj.save()
In my case one person calling the celery it will work perfectly
If many peoples passing the request it will process one by one
So imagine that my celery function "myceleryfunction" take 3 Min to complete the background task .
So if 10 request are coming at the same time, last one take 30 Min delay to complete the output
How to solve this issue or any other alternative .
Thank you
I'm assuming you are running a single worker with default settings for the worker.
This will have the worker running with worker_pool=prefork and worker_concurrency=<nr of CPUs>
If the machine it runs on only has a single CPU, you won't get any parallel running tasks.
To get parallelisation you can:
set worker_concurrency to something > 1, this will use multiple processes in the same worker.
start additional workers
use celery multi to start multiple workers
when running the worker in a docker container, add replica's of the container
See Concurrency for more info.

multi process Daphne with supervisor, getting [Errno 88] Socket operation on non-socket

daphne and Django channels work fine on command line or single process. But when I start it with supervisor, the error occurs.
2020-02-18 12:40:35,995 CRITICAL Listen failure: [Errno 88] Socket operation on non-socket
My config file is
[program:asgi]
socket=tcp://localhost:9000
directory=/root/test/test/
command=daphne -u /run/daphne/daphne%(process_num)d.sock --endpoint fd:fileno=0 --access-log - --proxy-headers test.asgi:application
# Number of processes to startup, roughly the number of CPUs you have
numprocs=2
# Give each process a unique name so they can be told apart
process_name=asgi%(process_num)d
# Automatically start and recover processes
autostart=true
autorestart=true
# Choose where you want your log to go
stdout_logfile=/root/test/test/script/asgi.log
redirect_stderr=true
[supervisord]
[supervisorctl]
Any ideas? Thanks!
There seemed to be a bug with either Python or Twisted unable to bind to the file descriptor 0 as referenced here: https://github.com/django/daphne/issues/263
Try to bind the socket to another fd, say 10 like so:
daphne -u /run/daphne/daphne%(process_num)d.sock --fd 10 --access-log - --proxy-headers test.asgi:application
If that doesn't work, try checking whether you have read and write access to the /run/daphne/ folder itself
First line - [program:asgi]
Must be like this: [fcgi-program:asgi]
If you carefully study the documentation, it is obvious that the first line of the config is incorrect.
https://channels.readthedocs.io/en/stable/deploying.html#example-setups
Look carefully at the given config in the example`

OpenMP code is using only 4 threads instead of the specified 72

I have a program written by someone else that uses OpenMP. I am running it on a cluster that uses Slurm as its job manager. Despite setting OMP_NUM_THREADS=72 and properly requesting 72 cores for the job, the job is only using four cores.
I have already used scontrol show job <job_id> --details to verify that there are 72 cores assigned to the job. I have also remoted into the node that the job is running on and used htop to inspect it. It was running 72 threads, all on four cores. It is worth noting that this is on an SMT4 power9 cpu, meaning that each physical core executes 4 simultaneous threads. Ultimately, it looks like openMP is putting all threads on one physical core. This is further complicated by the fact that this is an IBM system. I can't seem to find any useful documentation on more fine control of the openMP environment. Everything I find is for Intel.
I have also tried using taskset to manually change the affinity. This worked as intended and moved one of the threads to an unused core. The program continued to work as intended after this.
I could theoretically write a script to find all of the threads and call taskset to assign them to cores in a logical way, but I am afraid to do this. It seems like a bad idea to me. It would also take a while.
I guess my main question would be, is this a Slurm problem, an openMP problem, an IBM problem or a user error? Is there some environment variable I don't know about that I need to set? Will it break Slurm if I manually call taskset using a script? I would use scontrol to figure out which cpus are assigned to the job if I did that. I don't want to anger the people who run the cluster by messing things up though.
Here is the submission script. I can't include any of the actual running code due to license issues though. I'm hoping this will just be a simple matter of fixing an environment variable. The MPI_OPTIONS variables were recommended by the guy who administers the system. If by some chance someone here has worked with the ENKI cluster before, that's where this is running.
wrk_path=${PWD}
cat >slurm.sh <<!
#!/bin/bash
#SBATCH --partition=general
#SBATCH --time 2:00:00
#SBATCH -o log
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH --ntasks-per-socket=1
#SBATCH --cpus-per-task=72
#SBATCH -c 72
#SBATCH -J Cu-A-E
#SBATCH -D $wrk_path
cd $wrk_path
module load openmpi/3.1.3/2019
module load pgi/2019
export OMP_NUM_THREADS=72
MPI_OPTIONS="--mca btl_openib_if_include mlx5_0"
MPI_OPTIONS="$MPI_OPTIONS --bind-to socket --map-by socket --report-bindings"
time mpirun $MPI_OPTIONS ~/bin/pgmc-enki > out.dat
!
sbatch slurm.sh
Edit: Fix resulted in a 7x speedup when using 72 cores, vs. just running on 4 cores. Considering the nature of the calculations being run, this is pretty good.
Edit 2: Fix resulted in a 17x speedup when using 160 vs. just running on 4 cores.
This might not work for everyone, but I have a really hacky solution. I wrote a python script that uses psutil to find all threads that are children of the running process and set their affinity manually. This script uses scontrol to figure out which cpus are assigned to the job and uses taskset to force the threads to distribute across those cpus.
So far the process is running a lot faster. I'm sure that forcing CPU affinity isn't the best way to do it, but its a lot better than not using the available resources at all.
Here is the basic idea behind the code. The program I am running is called pgmc, hence the variable names. You will need to create an anaconda environment with psutil installed if you are running on a system like mine.
import psutil
import subprocess
import os
import sys
import time
# Gets the id for the current job.
def get_job_id():
return os.environ["SLURM_JOB_ID"]
# Returns a list of processors assigned to the job and the total number of cpus
# assigned to the job.
def get_proc_info():
run_str = 'scontrol show job %s --details'%get_job_id()
stdout = subprocess.getoutput(run_str)
id_spec = None
num_cpus = None
chunks = stdout.split(' ')
for chunk in chunks:
if chunk.lower().startswith("cpu_ids"):
id_spec = chunk.split('=')[1]
start, stop = id_spec.split('-')
id_spec = list(range(int(start), int(stop) + 1))
if chunk.lower().startswith('numcpus'):
num_cpus = int(chunk.split('=')[1])
if id_spec is not None and num_cpus is not None:
return id_spec, num_cpus
raise Exception("Couldn't find information about the allocated cpus.")
if __name__ == '__main__':
# Before we do anything, make sure that we can get the list of cpus
# assigned to the job. Once we have that, run the command line supplied.
cpus, cpu_count = get_proc_info()
if len(cpus) != cpu_count:
raise Exception("CPU list didn't match CPU count.")
# If we successefully got to here, run the command line.
program_name = ' '.join(sys.argv[1:])
pgmc = subprocess.Popen(sys.argv[1:])
time.sleep(10)
pid = [proc for proc in psutil.process_iter() if proc.name() == "your program name here"][0].pid
# Now that we have the pid of the pgmc process, we need to get all
# child threads of the process.
pgmc_proc = psutil.Process(pid)
pgmc_threads = list(pgmc_proc.threads())
# Now that we have a list of threads, we loop over available cores and
# assign threads to them. Once this is done, we wait for the process
# to complete.
while len(pgmc_threads) != 0:
for core_id in cpus:
if len(pgmc_threads) != 0:
thread_id = pgmc_threads[-1].id
pgmc_threads.remove(pgmc_threads[-1])
taskset_string = 'taskset -cp %i %i'%(core_id, thread_id)
print(taskset_string)
subprocess.getoutput(taskset_string)
else:
break
# All of the threads should now be assigned to a core.
# Wait for the process to exit.
pgmc.wait()
print("program terminated, exiting . . . ")
Here is the submission script used.
wrk_path=${PWD}
cat >slurm.sh <<!
#!/bin/bash
#SBATCH --partition=general
#SBATCH --time 2:00:00
#SBATCH -o log
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH --ntasks-per-socket=1
#SBATCH --cpus-per-task=72
#SBATCH -c 72
#SBATCH -J Cu-A-E
#SBATCH -D $wrk_path
cd $wrk_path
module purge
module load openmpi/3.1.3/2019
module load pgi/2019
module load anaconda3
# This is the anaconda environment I created with psutil installed.
conda activate psutil-node
export OMP_NUM_THREADS=72
# The two MPI_OPTIONS lines are specific to this cluster if I'm not mistaken.
# You probably won't need them.
MPI_OPTIONS="--mca btl_openib_if_include mlx5_0"
MPI_OPTIONS="$MPI_OPTIONS --bind-to socket --map-by socket --report-bindings"
time python3 affinity_set.py mpirun $MPI_OPTIONS ~/bin/pgmc-enki > out.dat
!
sbatch slurm.sh
My main reason for including the submission script is to demonstrate how the python script is used. More specifically, you call it, with your real job as an argument.

python multiprocessing Pool in Cluster

I am using the following code (example) to do parallel processing in a cluster.
It runs perfectly in the cluster using all available cores in a Node just by typing:
python test.py
from multiprocessing import Pool
import glob
from astropy.io import fits
def read(files):
data=fits.open(files)
if __name__ == '__main__':
files= glob.glob("*.fits*")
nfiles = len(files)
pool = Pool()
pool.map(read, files)
However, when I submit the batch job using
srun -N 1 python test.py
It seems to use only 1 core not all the available cores in that node.
What should I change to distribute nfiles in a node among all the cores such that each core will get nfiles/ncores.
That is because you are specifying -N 1 which means 1 node is assigned to the call python test.py. Also, the default value for another flag -n is 1. Change it to -N nnodes where nnodes equals the number of nodes that you intend to run your script on and specify -n ncores where ncores is equal to the number of cores on each node. If your nodes are heterogenous, then you may run the risk of setting up more tasks than cores on each node or idle cores on nodes.
For more documentation on the above flags see here.

How to manage automatic restarts of multiple Clojure apps with Supervisord?

I do not have any experience with Supervisord. If I look here:
https://tgallant.github.io/clojure/supervisord-with-clojure.html
I see this example of managing a Clojure app with Supervisord:
[program:blog-checker]
command= /usr/local/bin/java -jar target/blog-checker-0.1.0-SNAPSHOT-standalone.jar
directory=/usr/local/www/blog-checker
autostart=true
autorestart=true
startretries=3
user=www
If I would like to use Supervisord to keep 3 instances of my app running, do I create 3 separate entries, or is there a way to have just this one entry, but tell it to keep 3 instances going?
Does your application listen on a port? If so, here's an example of how you would keep four instances of a networked application running (in this case, ApiAxle's API):
[program:apiaxle-api]
process_name = apiaxle-api-%(process_num)s
command = apiaxle-api -f 1 -p %(process_num)s
directory = /home/apiaxle/apiaxle/api
numprocs = 4
numprocs_start = 3000
user=apiaxle
redirect_stderr=true
stdout_logfile=/var/log/apiaxle/api-%(process_num)s-stdout.log
stderr_logfile=/var/log/apiaxle/api-%(process_num)s-stderr.log
Four instances listening on ports 3000, 3001, 3002 and 3003. $(process_num) is a variable that represents the port.
You could do it by specifying numprocs and process_name, e.g.:
numprocs = 3
process_name = %(program_name)s_%(process_num)02d
An unless you want to count your processes from zero, you should specify numprocs_start variable as well:
numprocs_start = 1
You could find complete supervisor documentation here.