Once execution started it's completing in just few seconds but pending execution itself is taking 10 to 15 minutes, I agree that it's setting up the env for running the job, but In my case, I need to run this job(transforming JSON) for every 15 min, will this workout or any option is there? Am I missing any configuration?
Related
I am running AWS Glue jobs using PySpark. They have set Timeout (as visible on the screenshot) of 1440 mins, which is 24 hours. Nevertheless, the job continues working over those 24 hours.
When this particular job had been running for over 5 days I stopped it manually (clicking stop icon in column "Run status" in GUI visible on the screenshot). However, since then (it has been over 2 days) it still hasn't stopped - the "Run status" is Stopping, not Stopped.
Additionally, after about 4 hours of running, new logs (column "Logs") in CloudWatch regarding this Job Run stopped appearing (in my PySpark script I have print() statements which regularly and often log extra data). Also, last error log in CloudWatch (column "Error logs") has been written 24 seconds after the date of the newest log in "Logs".
This behaviour continues for multiple jobs.
My questions are:
What could be reasons for Job Runs not obeying set Timeout value? How to fix that?
Why the newest log is from 4 hours since starting the Job Run, while the logs should appear regularly during 24 hours of the (desired) duration of the Job Run?
Why the Job Runs don't stop if I try to stop them manually? How can they be stopped?
Thank you in advance for your advice and hints.
I have an MWAA Airflow env in my AWS account. The DAG I am setting up is supposed to read massive data from S3 bucket A, filter what I want and dump the filtered results to S3 bucket B. It needs to read every minute since the data is coming in every minute. Every run processes about 200MB of json data.
My initial setting was using env class mw1.small with 10 worker machines, if I only run the task once in this setting, it takes about 8 minutes to finish each run, but when I start the schedule to run every minute, most of them could not finish, starts to take much longer to run (around 18 mins) and displays the error message:
[2021-09-25 20:33:16,472] {{local_task_job.py:102}} INFO - Task exited with return code Negsignal.SIGKILL
I tried to expand env class to mw1.large with 15 workers, more jobs were able to complete before the error shows up, but still could not catch up with the speed of ingesting every minute. The Negsignal.SIGKILL error would still show before even reaching worker machine max.
At this point, what should I do to scale this? I can imagine opening another Airflow env but that does not really make sense. There must be a way to do it within one env.
I've found the solution to this, for MWAA, edit the environment and under Airflow configuration options, setup these configs
celery.sync_parallelism = 1
celery.worker_autoscale = 1,1
This will make sure your worker machine runs 1 job at a time, preventing multiple jobs to share the worker, hence saving memory and reduces runtime.
We set up a schedule to execute a command.
It is scheduled to run every 5 minutes as follows: 20090201T235900|20190201T235900|127|00:05:00
However, from the logs we see it runs only every hour.
Is there a reason for this?
check scheduling frequency in your sitecore.config file
<sitecore>
<scheduling>
<!-- Time between checking for scheduled tasks waiting to execute -->
<frequency>00:05:00</frequency>
</scheduling>
</sitecore>
The scheduling interval is based on the the scheduler interval and the job interval. Every scheduler interval period, all the configured jobs are evaluated. This is logged. During that evaluation, each job checked against the last time it ran, if that interval is greater that the configured job interval, the job is started.
It's fairly simple, but it's important to understand the mechanism. You can also see how it allows no way of inherently running jobs at a specific time, only at approximate intervals.
You can also see that jobs can never run more frequently than the scheduler interval regardless of the job interval. It is not unreasonable to set the scheduler to one-minute intervals to reduce the inaccuracy of job timings to no more than a minute.
In a worse case, with a 5 minute sheduler interval and a 5 minute job interval. The delay to job starting could be up to 9 minutes 59 seconds.
I have the developers edition of CF running on my machine, and I have a job that is scheduled to run:
Daily every 9 min(s) from 12:01 AM to 12:59 PM
but it's not running.
I can press the "Run Scheduled Task" button and it runs, but it's not running on it's own.
I have other jobs that run daily, but this one is not running every 9 minutes.
check the scheduler.log file for its execution and the next rescheduling time. If it hows a time which is not what you have set. Delete the job and recreate it again.
I have faced the same problem! and this was the way I made it running.
The best way to find out what's going on with the job is to take a look at the scheduler log in the CF Admin. After running the job, you should be able to check and see the next time it's scheduled to run.
Also, make sure the job isn't paused on the Scheduled Tasks page.
Does anyone know that when a scheduled task in Coldfusion runs it resets the interval timer or does the task run at the set interval time no matter how long that task run for?
For example, i create a task to run every 10 minutes that takes 5 minutes to run starting from 12pm. Will the task run at 12:00, then 12:10, then 12:20 etc etc.
Or would it run at 12:00 which takes 5 minutes, then at 12:15 ten minutes after the task has finished, then another 5 minutes to run so the next one would run at 12:30 etc etc.
Hope that makes sense.
Chris
For example, i create a task to run
every 10 minutes that takes 5 minutes
to run starting from 12pm. Will the
task run at 12:00, then 12:10, then
12:20 etc etc.
Yes.
The task will always run on the interval. So if you set it every 10 minutes it will run every 10 minutes after the first run.
Note: If the task runs over time (i.e. longer than the interval) than it will NOT queue. That particular run will just be skipped and the task will run at the next interval as per usual.
Hope that helps!