How to trigger now scheduled oozie job? - scheduling

I've submitted a job to Oozie using the following command:
oozie job -config ${config_file} -submit
My job is scheduled to run at 5 UTC every day (frequency = 1440). My question is - how to trigger an execution outside of this time range? Let's say I've submitted a job # 7 UTC but don't want to wait till the next day 5 UTC and want to trigger it right away manually after submission.
I've tried to start a job:
oozie job -oozie host -start coordinatior-job-id-C
But got:
Error: E0303 : E0303: Invalid parameter value, [action] = [start]
Properties file content:
nameNode=hdfs://<namenode>:8020
jobTracker=http://<namenode>:23140
queueName=root.oozie
user=${user.name}
oozie.libpath=/user/oozie/share/lib
oozie.use.system.libpath=true
oozie.coord.application.path=${nameNode}/user/${user.name}/<job.location>
appPath=${oozie.coord.application.path}
initTime=2020-04-20T00:15Z
interval=0
frequency=1440
start=2020-04-20T00:50Z
oozie.launcher.mapreduce.map.cpu.vcores=1
Thank you

The below command should work unless there are valid configurations:
oozie job -oozie <oozie_host> -start <workflow_id>
Note: A coordinator job does not support the start action.
Please provide job files in case of errors.

Related

How do i get parallel jobs working on failure

If I have a set of parallel jobs and 1 fails and the other succeeds.
I try the resume execution/retry failed nodes, it triggers both the jobs again.
Is there any setting in rundeck which can trigger only the failed job and not rerun the entire group again ? Is this a bug ?
The reason is that job reference steps is running on a single job, by design, rundeck consider that as a parent job exection and not individual executions (child jobs, the parallel jobs). If you want to avoid this, you will run these jobs individually (in a job calling each job using rd cli or Rundeck api) in a inline-script step.
In that way you can retry only the failed execution.
Now, to resume since a failed step, you can use the Job Resume Plugin (only for Rundeck Enterprise). The plugin allows to resume in the job failed step.

AWS Glue job run not respecting Timeout and not stopping

I am running AWS Glue jobs using PySpark. They have set Timeout (as visible on the screenshot) of 1440 mins, which is 24 hours. Nevertheless, the job continues working over those 24 hours.
When this particular job had been running for over 5 days I stopped it manually (clicking stop icon in column "Run status" in GUI visible on the screenshot). However, since then (it has been over 2 days) it still hasn't stopped - the "Run status" is Stopping, not Stopped.
Additionally, after about 4 hours of running, new logs (column "Logs") in CloudWatch regarding this Job Run stopped appearing (in my PySpark script I have print() statements which regularly and often log extra data). Also, last error log in CloudWatch (column "Error logs") has been written 24 seconds after the date of the newest log in "Logs".
This behaviour continues for multiple jobs.
My questions are:
What could be reasons for Job Runs not obeying set Timeout value? How to fix that?
Why the newest log is from 4 hours since starting the Job Run, while the logs should appear regularly during 24 hours of the (desired) duration of the Job Run?
Why the Job Runs don't stop if I try to stop them manually? How can they be stopped?
Thank you in advance for your advice and hints.

Unable to drain/cancel Dataflow job. It keeps pending state

Some jobs are remaining with pending pending state and I can't cancel them.
How do I cancel the job.
Web console shows like this.
"The graph is still being analyzed."
All logs are "No entries found matching current filter."
Job status: "Starting..."
There isn't appered a cancel button yet.
There are no instances in the Compute Engline tab.
What I did is below.
I created a streaming job. it was simple template job, Pubsub subscription to BigQuery. I set machineType as e2-micro because it was just a testing.
I also tried to drain and cancel by gcloud but it doesn't work.
$ gcloud dataflow jobs drain --region asia-northeast1 JOBID
Failed to drain job [...]: (...): Workflow modification failed. Causes: (...):
Operation drain not allowed for JOBID.
Job is not yet ready for draining. Please retry in a few minutes.
Please ensure you have permission to access the job and the `--region` flag, asia-northeast1, matches the job's
region.
This is jobs list
$ gcloud dataflow jobs list --region asia-northeast1
JOB_ID NAME TYPE CREATION_TIME STATE REGION
JOBID1 pubsub-to-bigquery-udf4 Streaming 2021-02-09 04:24:23 Pending asia-northeast1
JOBID2 pubsub-to-bigquery-udf2 Streaming 2021-02-09 03:20:35 Pending asia-northeast1
...other jobs...
Please let me know how to stop/cancel/delete these streaming jobs.
Job IDs:
2021-02-08_20_24_22-11667100055733179687
2021-02-08_20_24_22-11667100055733179687
WebUI:
https://i.stack.imgur.com/B75OX.png
https://i.stack.imgur.com/LzUGQ.png
As per personal experience some time few instance get stuck either they keep on running, or they cannot be canceled or you can not see thr graphical data flow pipelines. Best way to handle this kind of issue is to leave them in thr status, unless it is not impacting your solution by exceeding maximum concurrent runs at a moment. It will be canceled automatically or by Google team, since Dataflow is a google managed.
In GCP console Dataflow UI, if you have running Dataflow jobs, you will see the "STOP" button just like the below image.
Press the STOP button.
When you successfully stop your job, you will see the status like below. (I was too slow to stop the job with the first try, so I had to test it again. :) )

Triggered web job not scheduled to run

Our azure website contains a couple of web jobs (say job1 & job2) triggered by a cron expression in settings.job file.
{
"schedule": "0 * * * * *"
}
Every now and then job2 stops getting scheduled. This website has 'Always On' turned on. When looking at the portal, last runtime of job2 is 1/1/0001 12:00:00 AM. And looking at the scheduler logs for both job1 and job2 we find that job1 has messages like below:
[10/19/2015 19:19:00 > 846c07: SYS INFO] WebJob invoked
[10/19/2015 19:19:00 > 846c07: SYS INFO] Next schedule expected in 00:00:59.2588341
[10/19/2015 19:19:00 > a033a5: SYS INFO] Next schedule expected in 00:00:59.9580454
Where as WebJob invoked message is missing from job2 logs. That indicates the job is not invoked. Usually the problem disappears if I hit the Run once button in the portal against the job, but the issue seems repeating. What's the best way to troubleshoot or prevent such an issue.

Scheduled task "Daily every" not firing

I have the developers edition of CF running on my machine, and I have a job that is scheduled to run:
Daily every 9 min(s) from 12:01 AM to 12:59 PM
but it's not running.
I can press the "Run Scheduled Task" button and it runs, but it's not running on it's own.
I have other jobs that run daily, but this one is not running every 9 minutes.
check the scheduler.log file for its execution and the next rescheduling time. If it hows a time which is not what you have set. Delete the job and recreate it again.
I have faced the same problem! and this was the way I made it running.
The best way to find out what's going on with the job is to take a look at the scheduler log in the CF Admin. After running the job, you should be able to check and see the next time it's scheduled to run.
Also, make sure the job isn't paused on the Scheduled Tasks page.