I have a scenario where I've to schedule an Airflow DAG with different params based on scheduling criteria.
For instance, I've to:
Schedule DAG on Monday and Tuesday with the variable "source" as "process_usa"
Schedule DAG every Friday with the variable "source" as "process_ca"
Schedule DAG on Tuesday, Wednesday and Sunday with the variable "source" as "process_ind"
Any idea how can I do scheduling of the DAG with different params based on schedule.
There is no straightforward approach that I know of. But you can basically have a PythonOperator as the first task (or alternatively wrap your dag in another dag, if you don't want to disturb actual dag) and you should be able to handle schedule check in the function.
Related
I need a scheduled query only between Monday to Friday between 9 and 7 o'clock:
Scheduled queries is currently: every hour from 9:00 to 19:00
But how to modify for Mo-Fr ?
every monday to friday from 9:00 to 19:00 not working
every monday from 9:00 to 19:00 working (so day of the week is in general not working ?)
Thanks
UPDATE: The question at hand is much more complex than the Custom setting in BigQuery Scheduled Queries allows. For this purpose, #guillaume blaquiere has the best suggestion: use Cloud Scheduler to run a cron job. Tools like Crontab Guru can be helpful in creating a statement such as 00 9-19 * * 1-5.
For simpler Scheduled Queries, please review the following from the official documentation: Set up scheduled queries.
Specifically,
To specify a custom frequency, select Custom, then enter a Cron-like
time specification in the Custom schedule field; for example every 3
hours.
There is excellent documentation in the Custom Interval tab here on the many options you have available in this field.
thanks for the Feedback. So like this one ? But this is not working
I want to configure a job that runs at dawn on the first Saturday of every month through Cloud Scheduler.
Considering the Scheduler Job Frequency setting to be the first Saturday of every month, I have designated it as follows.
ex) 45 2 1-7 * 6
However, it was confirmed that the above scheduler was running on the 23rd, last Saturday.
Is it not possible to configure a monthly schedule in Cloud Scheduler?
If you could give me an answer, I would be very grateful.
I have checked these links in relation to the above.
Your current schedule, 45 2 1-7 * 6, reads as At 02:45 on every day-of-month from 1 through 7 and on Saturday. You can check it on Crontab guru.
In order to set a custom interval, you will need to use the App Engine Cron format.
In this case, try first saturday of month 02:45.
Is there a way to add expiry date to a Huey Dynamic periodic task ?
Just like there is an option in celery task - "some_celery_task.apply_async(args=('foo',), expires=expiry_date)"
to add expiry date while creating the task.
I want to add the expiry date while creating the Huey Dynamic periodic task. I used "revoke" , it worked as it supposed to , but I want to stop the task completely after the expiry date not revoking it . When the Huey dynamic periodic task is revoked - message is displayed on the Huey terminal that the huey function is revoked (whenever crontab condition becomes true).
(I am using Huey in django)
(Extra)
What I did to meet the need of this expiry date -
I created the function which return Days - Months pairs for crontab :
For eg.
start date = 2021-1-20 , end date = 2021-6-14
then the function will return - Days_Month :[['20-31',1], ['*','2-5'], ['1-14','6']]
Then I call the Huey Dynamic periodic task (three times in this case).
(the Days_Month function will return Day-Months as per requirement - Daily, Weekly, Monthly or repeating after n days)
Is there a better way to do this?
Thank you for the help.
The best solution will depend on how often you need this functionality of having periodic tasks with a specific end date but the ideal approach is probably involving your database.
I would create a database model (let's call it Job) with fields for your end_date, a next_execution_date and a field that indicates the interval between repetitions (like x days).
You would then create a periodic task with huey that runs every day (or even every hour/minute if you need finer grain of control). Every time this periodic task runs you would then go over all your Job instances and check whether their next_execution_date is in the past. If so, launch a new huey task that actually executes the functionality you need to have periodically executed per Job instance. On success, you calculate the new next_execution_date using the interval.
So whenever you want a new Job with a new end_date, you can just create this in the django admin (or make an interface for it) and you would set the next_execution_date as the first date where you want it to execute.
Your final solution would thus have the Job model and two huey decorated functions. One for the periodic task that merely checks whether Job instances need to be executed and updates their next_execution_date and another one that actually executes the periodic functionality per Job instance. This way you don't have to do any manual cancelling and you only need 1 periodic task that just runs indefinitely but doesn't actually execute anything if there are no Job instances that need to be run.
Note: this will only be a reasonable approach if you have multiple of these tasks and you potentially want to control the end_dates in your interface.
Can we use a single AWS data pipeline to run every Monday, Wednesday Friday, without creating multiple pipelines? If we can, how so?
I would like to order a job today in Control-M but have it run 8 days later, how would one go about doing this? Tomorrow the job should be ordered again for tomorrow's date and ran 8 days from tomorrow, so on and so forth.
If you want run a job after 8days if it is not schedule on that date then you can order out that job on 7thdays depending up on time . Or if the request is that it should run on permantely on calendar days then you can add schedule and move job to production so that it get automatically order out and run.