Our azure website contains a couple of web jobs (say job1 & job2) triggered by a cron expression in settings.job file.
{
"schedule": "0 * * * * *"
}
Every now and then job2 stops getting scheduled. This website has 'Always On' turned on. When looking at the portal, last runtime of job2 is 1/1/0001 12:00:00 AM. And looking at the scheduler logs for both job1 and job2 we find that job1 has messages like below:
[10/19/2015 19:19:00 > 846c07: SYS INFO] WebJob invoked
[10/19/2015 19:19:00 > 846c07: SYS INFO] Next schedule expected in 00:00:59.2588341
[10/19/2015 19:19:00 > a033a5: SYS INFO] Next schedule expected in 00:00:59.9580454
Where as WebJob invoked message is missing from job2 logs. That indicates the job is not invoked. Usually the problem disappears if I hit the Run once button in the portal against the job, but the issue seems repeating. What's the best way to troubleshoot or prevent such an issue.
Related
I have an Apache Airflow managed environment running in which a number of DAGs are defined and enabled. Some DAGs are scheduled, running on a 15 minute schedule, while others are not scheduled. All the DAGs are single-task DAGs. The DAGs are structured in the following way:
level 2 DAGs -> (triggers) level 1 DAG -> (triggers) level 0 DAG
The scheduled DAGs are the level 2 DAGs, while the level 1 and level 0 DAGs are unscheduled. The level 0 DAG uses ECSOperator to call a pre-defined Elastic Container Service (ECS) task, to call a Python ETL script inside a Docker container defined in the ECS task. The level 2 DAGs wait on the level 1 DAG to complete, which in turns waits on the level 0 DAG to complete. The full Python logs produced by the ETL scripts are visible in the CloudWatch logs from the ECS task runs, while the Airflow task logs only show high-level logging.
The singular tasks in the scheduled DAGs (level 2) have depends_on_past set to False, and I expected that as a result successive scheduled runs of a level 2 DAG would not depend on each other, i.e. that if a particular run failed it would not prevent the next scheduled run from occurring. But what is happening is that Airflow is overriding this and I can clearly see in the UI that a failure of a particular level 2 DAG run is preventing the next run from being selected by the scheduler - the next scheduled run state is being set to None, and I have to manually clear the failed DAG run state before the scheduler can schedule it again.
Why does this happen? As far as I know, there is no Airflow configuration option that should override the task-level setting of False for depends_on_past in the level 2 DAG tasks. Any pointers would be greatly appreciated.
Answering the question "why is this happening?". I understand that the behavior you are watching is explained by the fact that Tasks are being defined with wait_for_downstream = True. The docs state the following about it:
wait_for_downstream (bool) -- when set to true, an instance of task X will wait for tasks immediately downstream of the previous instance of task X to finish successfully or be skipped before it runs. This is useful if the different instances of a task X alter the same asset, and this asset is used by tasks downstream of task X. Note that depends_on_past is forced to True wherever wait_for_downstream is used. Also note that only tasks immediately downstream of the previous task instance are waited for; the statuses of any tasks further downstream are ignored.
Keep in mind that the term previous instances of task X refers to the task_instance of the last scheduled dag_run, not the upstream Task (in a DAG with a daily schedule, that would be the task_instance from "yesterday").
This also explains why your Task are being executed once you clear the state of the previous DAG Run.
I hope it helps you clarifying things up!
Some jobs are remaining with pending pending state and I can't cancel them.
How do I cancel the job.
Web console shows like this.
"The graph is still being analyzed."
All logs are "No entries found matching current filter."
Job status: "Starting..."
There isn't appered a cancel button yet.
There are no instances in the Compute Engline tab.
What I did is below.
I created a streaming job. it was simple template job, Pubsub subscription to BigQuery. I set machineType as e2-micro because it was just a testing.
I also tried to drain and cancel by gcloud but it doesn't work.
$ gcloud dataflow jobs drain --region asia-northeast1 JOBID
Failed to drain job [...]: (...): Workflow modification failed. Causes: (...):
Operation drain not allowed for JOBID.
Job is not yet ready for draining. Please retry in a few minutes.
Please ensure you have permission to access the job and the `--region` flag, asia-northeast1, matches the job's
region.
This is jobs list
$ gcloud dataflow jobs list --region asia-northeast1
JOB_ID NAME TYPE CREATION_TIME STATE REGION
JOBID1 pubsub-to-bigquery-udf4 Streaming 2021-02-09 04:24:23 Pending asia-northeast1
JOBID2 pubsub-to-bigquery-udf2 Streaming 2021-02-09 03:20:35 Pending asia-northeast1
...other jobs...
Please let me know how to stop/cancel/delete these streaming jobs.
Job IDs:
2021-02-08_20_24_22-11667100055733179687
2021-02-08_20_24_22-11667100055733179687
WebUI:
https://i.stack.imgur.com/B75OX.png
https://i.stack.imgur.com/LzUGQ.png
As per personal experience some time few instance get stuck either they keep on running, or they cannot be canceled or you can not see thr graphical data flow pipelines. Best way to handle this kind of issue is to leave them in thr status, unless it is not impacting your solution by exceeding maximum concurrent runs at a moment. It will be canceled automatically or by Google team, since Dataflow is a google managed.
In GCP console Dataflow UI, if you have running Dataflow jobs, you will see the "STOP" button just like the below image.
Press the STOP button.
When you successfully stop your job, you will see the status like below. (I was too slow to stop the job with the first try, so I had to test it again. :) )
My configured email report is named "Raw player games" with crontab */20 * * * * (At every 20th minute I will expect a report in my email box). Look in the screenshot raw player games.
Another crontab is configured in main superset config - superset_config.py
# superset_config.py
CELERYBEAT_SCHEDULE = {
'email_reports.schedule_hourly': {
'task': 'email_reports.schedule_hourly',
'schedule': crontab(minute=1, hour='*'), # At minute 1, every hour
},
}
I receive emails, but only one per hour. I don't see any errors in logs, all jobs in celery flower are in success state.
apache-superset==0.37.2
celery==4.4.7
Why?
Why superset send me reports only once in a hour? How to reconfigure superset to correctly handle my crontabs, what I missed?
Note that your beat schedule is configured to run hourly. So on every minute one of each hour, beat is going to enqueue a new job that will verify if it's time to send a new report. It will not matter to have a thinner resolution configured on superset itself.
Yet by default email reports functionality has an hourly resolution:
https://github.com/apache/incubator-superset/blob/master/superset/tasks/schedules.py#L823
This default can be changed by configuring:
EMAIL_REPORTS_CRON_RESOLUTION
I've submitted a job to Oozie using the following command:
oozie job -config ${config_file} -submit
My job is scheduled to run at 5 UTC every day (frequency = 1440). My question is - how to trigger an execution outside of this time range? Let's say I've submitted a job # 7 UTC but don't want to wait till the next day 5 UTC and want to trigger it right away manually after submission.
I've tried to start a job:
oozie job -oozie host -start coordinatior-job-id-C
But got:
Error: E0303 : E0303: Invalid parameter value, [action] = [start]
Properties file content:
nameNode=hdfs://<namenode>:8020
jobTracker=http://<namenode>:23140
queueName=root.oozie
user=${user.name}
oozie.libpath=/user/oozie/share/lib
oozie.use.system.libpath=true
oozie.coord.application.path=${nameNode}/user/${user.name}/<job.location>
appPath=${oozie.coord.application.path}
initTime=2020-04-20T00:15Z
interval=0
frequency=1440
start=2020-04-20T00:50Z
oozie.launcher.mapreduce.map.cpu.vcores=1
Thank you
The below command should work unless there are valid configurations:
oozie job -oozie <oozie_host> -start <workflow_id>
Note: A coordinator job does not support the start action.
Please provide job files in case of errors.
I have the developers edition of CF running on my machine, and I have a job that is scheduled to run:
Daily every 9 min(s) from 12:01 AM to 12:59 PM
but it's not running.
I can press the "Run Scheduled Task" button and it runs, but it's not running on it's own.
I have other jobs that run daily, but this one is not running every 9 minutes.
check the scheduler.log file for its execution and the next rescheduling time. If it hows a time which is not what you have set. Delete the job and recreate it again.
I have faced the same problem! and this was the way I made it running.
The best way to find out what's going on with the job is to take a look at the scheduler log in the CF Admin. After running the job, you should be able to check and see the next time it's scheduled to run.
Also, make sure the job isn't paused on the Scheduled Tasks page.