Sidekiq: How to assign process to a worker? - ruby-on-rails-4

I'm having bit struggle in Sidekiq with multiple workers and multiple Sidekiq processes.
I'ld like to run three sidekiq processes for a environemt.
I'm having three workers class (Lets say "worker1", "worker2" and "worker3") and three sidekiq processes (Lets say "process1", "process2" and "process3").
In my current structure the workers are running on any process which is available. But what I want is the worker1 should run on process1 and worker2 should run on process2 and so on.
I'm bit confused about achieving it and I'll be glad if I know how to set a worker to a particular sidekiq process.
Sidekiq processes are:
process1: bundle exec sidekiq -q default
process2: bundle exec sidekiq -C config/myapp_sidekiq.yml
process3: bundle exec sidekiq -q process3
Thanks in advance...

You assign processes to pull jobs from queues. You assign workers to go into queues.
process1: bundle exec sidekiq -q queue1
class Worker1
sidekiq_options queue: 'queue1'
end
# will only be processed by process1
Worker1.perform_async

Related

Correct Way for Celery Process Architecture and demonizing

I have a Python/Django Project running on uwsgi/nginx. For asynchronous task we are using rabbitmq/celeryd and supervisord to manage all the daemons
Versions:
python: 2.7
django: 1.9.7
celery: 3.1.23
django-celery: 3.1.17
Celery has 10 queue of type Direct (say queue1, queue2, ...)
Each queue is handled by a separate celeryd process which is manage via supervisord. each supervisord process looks as following
[program:app_queue_worker]
command=/var/www/myproj/venv/bin/celery worker -A myproj -c 2 --queue=queue1 --loglevel=INFO
directory=/var/www/myproj/
user=ubuntu
numprocs=1
autostart=true
autorestart=true
startsecs=10
exitcodes=1
stopwaitsecs = 600
killasgroup=true
priority=1000
Hence Supervisord is running 10 Mainprocess and 20 Worker process
Other Thing I have noticed is uwsgi also spawns some celery workers(Dont understand how and why, YET ) with concurrency=2. So if I have 4 uwsgi process running i will have an addition 10 celery workers running
All these workers are each taking 200-300M memory? Something is wrong here I feel it but I am not able to put my finger on it. Celery shouldn't be running such memory heavy process?
Note: Debug=False, there is no memory leakage due to debug
Can someone please comment on the architecture if it is correct or wrong?
Would it be better to run 2-3 celery MainProcesses which listen all queues at once and increase its concurrency?
Update : celery.py Config
from celery import Celery
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'MyProject.settings')
from django.conf import settings # noqa
from chatterbox import celery_settings
app = Celery('MyProject')
# Using a string here means the worker will not have to
# pickle the object when using Windows.
app.config_from_object('django.conf:settings')
app.conf.update(
CELERY_RESULT_BACKEND='djcelery.backends.database:DatabaseBackend',
CELERYD_CONCURRENCY=1,
)
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
There is no simple answer to this.
To me, the fact that uwsgi spawns celery workers is wrong.
Creating only worker processes that consume all queues might lead to the situation where long running tasks make some queues overflow whereas separate workers that consume specific queues with short running tasks could make the situation better. Everything depends on your use case.
The 300mb residual memory is quite a lot. If the tasks are i/o bound go multi-thread/gevent. However, if the tasks are CPU bound, you have no other option than to scale process wise.
If you start a celery worker with concurrency of n, it will spawn n + 1 process by default. Since you are spawning 10 workers with a concurrency of 2, celery will start 30 processes.
Each worker consumes ~60MB(~30MB for main process & 2*~15MB for subprocesses) of memory when not consuming queues. It might vary depending what your worker is doing. If you start 10 workers, it will consume ~600MB of memory.
I am not sure how you came to know that uwsgi also spawns some celery workers. Only supervisor should spawn the process.
You can run just 1 celery worker which listens to all queues with a concurrency of 20. This will reduce your memory usage at the cost of flexibility. With this setup, you can't start/stop consuming from selected queues. Also, there is no guarantee that all queues will be consumed equally.

Sidekiq - job stuck in enqueued

I have a rails 4 app. I'm using paperclip and delayed_paperclip with sidekiq. The background job is being picked up by sidekiq but it just hangs forever in the Enqueued queue.
This is what I see on sidekiq:
{"job_class"=>"DelayedPaperclip::Jobs::ActiveJob", "job_id"=>"946856ca-c90e-4bb4-9f41-1f1e59269acb", "queue_name"=>"paperclip", "arguments"=>["User", 109, "avatar"]}
In my User model I have:
process_in_background :avatar
procfile.yml
worker: bundle exec sidekiq
I have read all related issues to this question and still couldn't figure out the issue. Thoughts anyone?
Thanks.
You've enqueued the job to the paperclip queue but you have not configured Sidekiq to pull jobs from that queue:
bundle exec sidekiq -q default -q paperclip

celery control add_consumer giving Error: No nodes replied within time constraint

I want to configure celery worker to consume only from a particular queue,
I saw in celery docs that control add_consumer does exactly that.
Problem is when I try :
celery control -A [App_name] add_consumer [queue_name] worker1.h%
it gives me error :
Error: No nodes replied within time constraint
Any help is really appreciated.
Is there any other way I can make my worker consume from a specific queue?
Note : celery -A [App_name] worker1.h%
starts the celery worker, and everything works fine just that is works on all my queues. I want to dedicate a worker to consume from specific queue.
Broker used: rabbitmq
I would just run a separate worker
celery -A app_name -Q queue_name --concurrency=1

Django celery task execution

I have 3 tasks in tasks.py file when demon is working only one task execute . Rest of two display in debug mode but not execute.
celery worker --concurrency=1
Is there any changes need ?
Edit :
#celery.decorators.periodic_task(run_every=datetime.timedelta(minutes=1))
def task1():
try:
sc = SampleCount.objects.get(pk=1)
except:
sc = SampleCount()
sc.num = sc.num + 1
sc.save()
return(sc)
same task2 for add data into SampleCount2(minute=2) and task3 for add data into SampleCount3(minute=3).
Now i run
python manage.py celeryd -E -B --loglevel=DEBUG
so it will display 3 task. but when I check status its display only 1 task1 active . No other two in scheduled.if i run
python manage.py celeryd -E -B --loglevel=DEBUG
command again then task3 is active and rest of two not working. How to run three task using one worker .
I have wait till 10 minute but only one task working(may be its task1, or task2 or task3 ).
It depends on what resources are available on server.
There is not such requirement, that number of tasks must be equal to number of workers. The only thing is needed to understand:
the more workers are running, the faster all tasks will be processed, but more server resources will be needed.
This command:
celery worker --concurrency=1
Will spawn two processes
1. Control process, that will manage workers
2. worker process, the one, that will actually do the job
And this command:
celery worker --concurrency=2
Will spawn three processes
1. Control process, that will manage workers
2. worker process #1, the one, that will actually do the job
2. worker process #2, the one, that will actually do the job
And so on.
One worker can process all of your tasks, but two worker will do it faster.

How do I restart celery workers gracefully?

While issuing a new build to update code in workers how do I restart celery workers gracefully?
Edit:
What I intend to do is to something like this.
Worker is running, probably uploading a 100 MB file to S3
A new build comes
Worker code has changes
Build script fires signal to the Worker(s)
Starts new workers with the new code
Worker(s) who got the signal after finishing the existing job exit.
According to https://docs.celeryq.dev/en/stable/userguide/workers.html#restarting-the-worker you can restart a worker by sending a HUP signal
ps auxww | grep celeryd | grep -v "grep" | awk '{print $2}' | xargs kill -HUP
celery multi start 1 -A proj -l info -c4 --pidfile=/var/run/celery/%n.pid
celery multi restart 1 --pidfile=/var/run/celery/%n.pid
http://docs.celeryproject.org/en/latest/userguide/workers.html#restarting-the-worker
If you're going the kill route, pgrep to the rescue:
kill -9 `pgrep -f celeryd`
Mind you, this is not a long-running task and I don't care if it terminates brutally. Just reloading new code during dev. I'd go the restart service route if it was more sensitive.
You can do:
celery multi restart w1 -A your_project -l info # restart workers
Example
You should look at Celery's autoreloading
What should happen to long running tasks? I like it this way: long running tasks should do their job. Don't interrupt them, only new tasks should get the new code.
But this is not possible at the moment: https://groups.google.com/d/msg/celery-users/uTalKMszT2Q/-MHleIY7WaIJ
I have repeatedly tested the -HUP solution using an automated script, but find that about 5% of the time, the worker stops picking up new jobs after being restarted.
A more reliable solution is:
stop <celery_service>
start <celery_service>
which I have used hundreds of times now without any issues.
From within Python, you can run:
import subprocess
service_name = 'celery_service'
for command in ['stop', 'start']:
subprocess.check_call(command + ' ' + service_name, shell=True)
If you're using docker/docker-compose and putting celery into a separate container from the Django container, you can use
docker-compose kill -s HUP celery
, where celery is the container name. The worker will be gracefully restarted and the ongoing task is not brutally stopped.
Tried pkill, kill, celery multi stop, celery multi restart, docker-compose restart. All not working. Either the container is stopped abruptly or the code is not reloaded.
I just want to reload my code in the prod server manually with a 1-liner. Don't want to play with daemonization.
Might be late to the party. I use:
sudo systemctl stop celery
sudo systemctl start celery
sudo systemctl status celery