Fair Scheduler without preemption on YARN - mapreduce

I have the following FairScheduler configuration for a M/R job on EMR 5.16.0 (Hadoop 2.8.3):
queue1 - weight 1.0
queue2 - weight 3.0
This 2 queues are under root queue.
I start an application app1 on queue1 and given the fact there is nothing else running, the application will take 100% of the EMR cluster.
While this queue1 is still at 100%, I start a new application app2. My expectation would be that app2 is waiting for a task from app1 to finish, and after this, it receives the memory/vcores.
Unfortunately, after my experiments I see that even if an app1 task finishes, in some cases the container is not given to app2, but tasks for app1 are rescheduled on it.
The M/R tasks for app1 and app2 have the same memory/vcores. All the configurations (except the queue config, mapreduce.map.memory.mb, mapreduce.reduce.memory.mb, mapreduce.map.cpu.vcores, mapreduce.reduce.cpu.vcores) have the default values.
Eventually, queue2 reaches 75% of the cluster capacity, but after too many app1 tasks.
I have tried to use fairSharePreemption, but this kills tasks from app1 and I do not want this.
I also tried using drf schedulingPolicy for my queues, but still not the result I want.
Do you know any configurations I could try so that app2 waits for the app1 tasks, but it receives the free slots without app1 rescheduling any of its tasks (again) on it?

Related

Scheduler duplicate email 8 times [duplicate]

We have a web app made with pyramid and served through gunicorn+nginx. It works with 8 worker threads/processes
We needed to jobs, we have chosen apscheduler. here is how we launch it
from apscheduler.events import EVENT_JOB_EXECUTED, EVENT_JOB_ERROR
from apscheduler.scheduler import Scheduler
rerun_monitor = Scheduler()
rerun_monitor.start()
rerun_monitor.add_interval_job(job_to_be_run,\
seconds=JOB_INTERVAL)
The issue is that all the worker processes of gunicorn pick the scheduler up. We tried implementing a file lock but it does not seem like a good enough solution. What would be the best way to make sure at any given time only one of the worker process picks the scheduled event up and no other thread picks it up till next JOB_INTERVAL?
The solution needs to work even with mod_wsgi in case we decide to switch to apache2+modwsgi later. It needs to work with single process development server which is waitress.
Update from the bounty sponsor
I'm facing the same issue described by the OP, just with a Django app. I'm mostly sure adding this detail won't change much if the original question. For this reason, and to gain a bit more of visibility, I also tagged this question with django.
Because Gunicorn is starting with 8 workers (in your example), this forks the app 8 times into 8 processes. These 8 processes are forked from the Master process, which monitors each of their status & has the ability to add/remove workers.
Each process gets a copy of your APScheduler object, which initially is an exact copy of your Master processes' APScheduler. This results in each "nth" worker (process) executing each job a total of "n" times.
A hack around this is to run gunicorn with the following options:
env/bin/gunicorn module_containing_app:app -b 0.0.0.0:8080 --workers 3 --preload
The --preload flag tells Gunicorn to "load the app before forking the worker processes". By doing so, each worker is "given a copy of the app, already instantiated by the Master, rather than instantiating the app itself". This means the following code only executes once in the Master process:
rerun_monitor = Scheduler()
rerun_monitor.start()
rerun_monitor.add_interval_job(job_to_be_run,\
seconds=JOB_INTERVAL)
Additionally, we need to set the jobstore to be anything other than :memory:.This way, although each worker is its own independent process unable of communicating with the other 7, by using a local database (rather then memory) we guarantee a single-point-of-truth for CRUD operations on the jobstore.
from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
rerun_monitor = Scheduler(
jobstores={'default': SQLAlchemyJobStore(url='sqlite:///jobs.sqlite')})
rerun_monitor.start()
rerun_monitor.add_interval_job(job_to_be_run,\
seconds=JOB_INTERVAL)
Lastly, we want to use the BackgroundScheduler because of its implementation of start(). When we call start() in the BackgroundScheduler, a new thread is spun up in the background, which is responsible for scheduling/executing jobs. This is significant because remember in step (1), due to our --preload flag we only execute the start() function once, in the Master Gunicorn process. By definition, forked processes do not inherit the threads of their Parent, so each worker doesn't run the BackgroundScheduler thread.
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
rerun_monitor = BackgroundScheduler(
jobstores={'default': SQLAlchemyJobStore(url='sqlite:///jobs.sqlite')})
rerun_monitor.start()
rerun_monitor.add_interval_job(job_to_be_run,\
seconds=JOB_INTERVAL)
As a result of all of this, every Gunicorn worker has an APScheduler that has been tricked into a "STARTED" state, but actually isn't running because it drops the threads of it's parent! Each instance is also capable of updating the jobstore database, just not executing any jobs!
Check out flask-APScheduler for a quick way to run APScheduler in a web-server (like Gunicorn), and enable CRUD operations for each job.
I found a fix that worked with a Django project having a very similar issue. I simply bind a TCP socket the first time the scheduler starts and check against it subsequently. I think the following code can work for you as well with minor tweaks.
import sys, socket
try:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(("127.0.0.1", 47200))
except socket.error:
print "!!!scheduler already started, DO NOTHING"
else:
from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()
scheduler.start()
print "scheduler started"
Short answer: You can't do it properly without consequences.
I'm using Gunicorn as an example, but it is essentially the same for uWSGI. There are various hacks when running multiple processes, to name a few:
use --preload option
use on_starting hook to start the APScheduler background scheduler
use when_ready hook to start the APScheduler background scheduler
They work to some extent but may get the following errors:
worker timing out frequently
scheduler hanging when there are no jobs https://github.com/agronholm/apscheduler/issues/305
APScheduler is designed to run in a single process where it has complete control over the process of adding jobs to job stores. It uses threading.Event's wait() and set() methods to coordinate. If they are run by different processes, the coordination wouldn't work.
It is possible to run it in Gunicorn in a single process.
use only one worker process
use the post_worker_init hook to start the scheduler, this will make sure the scheduler is run only in the worker process but not the master process
The author also pointed out sharing the job store amount multiple processes isn't possible. https://apscheduler.readthedocs.io/en/stable/faq.html#how-do-i-share-a-single-job-store-among-one-or-more-worker-processes He also provided a solution using RPyC.
While it's entirely doable to wrap APScheduler with a REST interface. You might want to consider serving it as a standalone app with one worker. In another word, if you have others endpoints, put them in another app where you can use multiple workers.

celery force all celery processes to run task

I have a setup with multiple EC2's and I have celery running on all of them. I also have one box with celerybeat running on it. I'm able to get celerybeat to run with tasks running on the remaining celery clients.
Is there a way to make a required task that all celery instances have to run? The use case would be clearing logs, running basic sanity checks on the boxes, etc.
I have read the below:
http://docs.celeryproject.org/en/latest/userguide/workers.html
Actually celery works not as a push-like system, but as a pull-like one.
You just put a task into the queue and one of available workers get it and execute.
From your question I assume that under instances you mean servers on whose celery's workers are running. So you would like to run the same task on all servers.
I think you can only put some tasks (corresponding to servers number) and specify exact routing number for each task (the same as worker's id).
mytask.apply_async(kwargs={'a': 1, 'b': 2}, routing_key='aaabbc-dddeeff-243453')
mytask.apply_async(kwargs={'a': 1, 'b': 2}, routing_key='bbbbbb-fffddd-dabcfe')
...
Broadcast
Celery can also support broadcast routing. Here is an
example exchange broadcast_tasks that delivers copies of tasks to all
workers connected to it:
from kombu.common import Broadcast
CELERY_QUEUES = (Broadcast('broadcast_tasks'), )
CELERY_ROUTES = {'tasks.reload_cache': {'queue': 'broadcast_tasks'}}
Now the tasks.reload_cache task will be sent to every worker consuming
from this queue.

celery beat messages never arrive to rabbitmq

I'm having a lot of problem executing certain tasks with celery beat. Some tasks like the one below get triggered by beat but the message is never received by rabbitmq.
In my django settings file I have the following perdiodic task
CELERYBEAT_SCHEDULE = {
...
'update_locations': {
'task': 'cron.tasks.update_locations',
'schedule': crontab(hour='10', minute='0')
},
...
}
at 10 UTC beat executes the task as expected
[2015-05-13 10:00:00,046: DEBUG/MainProcess] cron.tasks.update_locations sent. id->a1c53d0e-96ca-4673-9d03-972888581176
but this message is never arrives to rabbitmq (I'm using the tracing module in rabbitmq to track incoming messages). I have several other tasks which seem to run fine but certain tasks like the one above never run. Running the tasks manually in django with cron.tasks.update_locations.delay() runs the task with no problem. Note my Rabbitmq is on a different server than beat.
Is there anything I can do to ensure the message was actually sent and/or received by rabbitmq? Is there a better or other way to schedule these tasks to ensure they run?
A bit hard to answer from these minimal descriptions.
why is this in the Django settings file? I would have expected the Celery config settings to have their own config object.
Look at http://celery.readthedocs.org/en/latest/reference/celery.html#celery.Celery.config_from_object

Workflow of celery

I am a beginner with django, I have celery installed.
I am confused about the working of the celery, if the queued works are handled synchronously or asynchronously. Can other works be queued when the queued work is already being processed?
Celery is a task queuing system, that is backed by a message queuing system, Celery allows you to invoke tasks asynchronously, in a way that won't block your process for the task to finish, you can wait for the task to finish using the AsyncResult.get.
Other tasks can be queued while a task is being processed, and if Celery is running more than one process/thread (which is the default case), tasks will be executed in parallel to each others.
It is your responsibility to make sure that related tasks are executed in the correct order, e.g. if the output of a task A is an input to the other task B then you should make sure that you get the result from task A before you start the task B.
Read Avoid launching synchronous subtasks from Celery documentation.
I think you're possibly a bit confused about what Celery does.
Celery isn't really responsible for queueing at all. That is taken care of by the queue itself - RabbitMQ, Redis, or whatever. The only way Celery gets involved at this end is as a library that you call inside your app to serialize to task into something suitable for putting onto the queue. Since that is done by your web app, it is exactly as synchronous or asynchronous as your app itself: usually, in production, you'd have multiple processes running your site, each of those could put things onto the queue simultaneously, but each queueing action is done in-process.
The main point of Celery is the separate worker processes. This is where the asynchronous bit comes from: the workers run completely separately from your web app, and pick tasks off the queue as necessary. They are not at all involved in the process of putting tasks onto the queue in the first place.

Problems stopping celeryd

I'm running celeryd as a daemon, but I sometimes have trouble stopping it gracefully. When I send the TERM signal and there are items in the queue (in this case service celeryd stop) celeryd will stop taking new jobs, and shut down all the worker processes. The parent process, however, won't shut down.
I've just ran into a scenario where I had celeryd running on two separate worker machines: A and B. With about 1000 messages on the RabbitMQ server, I shut down A, and experienced the situation I've explained above. B continued to work, but then stalled with about 40 messages left on the server. I was however, able to stop B correctly.
I restarted B, to see if it would take the 40 items off the queue, but it would not. Next, I hard killed A, after which B grabbed and completed the tasks.
My conclusions is that the parent process has reserved the 40 items from our RabbitMQ server for its children. It will reap the children correctly, but will not release the items back to RabbitMQ unless I manually kill it.
Has anyone experienced something similar?
I'm running Celery 2.2.2
I believe this is related to:
https://github.com/celery/celery/issues/264
Setting
CELERY_DISABLE_RATE_LIMITS = False
in your settings.py file should work.