I'm using Airflow with Cloud Composer. I have a Dag scheduled to work every hour. However, I read something like this in the composer documentation: "Maintenance operations might impact the execution of your DAGs and Airflow tasks"
But the cron job I created should run for every hour. I don't want any downtime due to maintenance windows.
I'm worried about any problems with the selected maintenance windows. Can you give me information about this? Do I have an option to close the maintenance window?
Thanks in advance.
As you have mentioned, running DAGs on the maintenance window might cause scheduling or execution issues. This was also mentioned in Cloud Composer Troubleshooting Scheduler issues.
You can define specific maintenance windows for your environment.
During these time periods, maintenance events for Cloud SQL and GKE
take place.
Avoid scheduling DAG runs during maintenance windows because this
might cause scheduling or execution issues.
Also removing the maintenance window won't be considered as a feature moving forward as per Cloud Composer Issue Tracker.
Information from the Engineering team is that this feature is not in
the road map. Hence, We're not going to have this feature in the
future to remove the maintenance window once it's applied to the
composer environment.
Reason: If Environment don't have any maintenance windows then
maintenance operations are happening at random times and having
maintenance window allows to have maintenance operations in
predictable slots.
Unfortunately your only option is to deal with the scheduling or execution issues if ever you encounter one. Since you have mentioned that your DAG runs every hour.
Related
Workflow failed. Causes: The Dataflow job appears to be stuck because no worker activity has been seen in the last 1h. Please check the worker logs in Stackdriver Logging. You can also get help with Cloud Dataflow at https://cloud.google.com/dataflow/support.
I am using service account with all required IAM roles
Generally The Dataflow job appears to be stuck because no worker activity has been seen in the last 1h can be caused by too long setup progress. In order to solve this issue you can try to increase worker resources (via --machine_type parameter) to overcome the issue.
For example, While installing several dependencies that required building wheels (pystan, fbprophet) which will take more than an hour on the minimal machine (n1-standard-1 with 1 vCPU and 3.75GB RAM). Using a more powerful instance (n1-standard-4 which has 4 times more resources) will solve the problem.
You can debug this by looking at the worker startup logs in cloud logging. You are likely to see pip issues with installing dependencies.
Do you have any error logs showing that Dataflow Workers are crashing when trying to start?
If not, maybe worker VMs are started but they can't reach the Dataflow service, which is often related to network connectivity.
Please note that by default, Dataflow creates jobs using the network and subnetwork default (please check if it exists on your project), and you can change to a specific one by specifying --subnetwork. Check https://cloud.google.com/dataflow/docs/guides/specifying-networks for more information.
I am deploying a pipeline to Google Cloud DataFlow using Apache Beam. When I want to deploy a change to the pipeline, I drain the running pipeline and redeploy it. I would like to make this faster. It appears from the logs that on each deploy DataFlow builds up new worker nodes from scratch: I see Linux boot messages going by.
Is it possible to drain the pipeline without tearing down the worker nodes so the next deployment can reuse them?
rewriting Inigo's answer here:
Answering the original question, no, there's no way to do that. Updating should be the way to go. I was not aware it was marked as experimental (probably we should change that), but the update approach has not changed in the last 3 i have been using DF. About the special cases of update not working, supposing your feature existed, the workers would still need the new code, so no really much to save, and update should work in most of the other cases.
I have been trying to find a way to save on the costs of Airflow by disabling it when not in use. I have discovered that if we disable the composer.googleapis.com service while not in use that Google does not charge for the service while it is disabled, although it does continue to charge for other resources that are still active. Unfortunately, if the service is disabled for more than an hour or so, the service is not usable after re-enabling it. After the service has been disabled for an extended period of time, the Composer Environment Details Page shows
An error occurred with retrieving the last operation on this environment
and
This environment cannot be edited due to the errors that occurred during environment creation/update. Please investigate the logs to determine the cause, or create a new environment.
And gcloud composer environments describe shows state: ERROR
The one error that I did see in the logs was due to a duplicate key when the airflow_monitoring DAG was rescheduled after a little over an hour. Therefore, created a new Composer environment, disabled all DAGs, disabled the composer service, waited a while, then enabled it again. The environment was once again in an error state.
The Cloud Composer documentation states:
If you disable the Cloud Composer API, environments become unusable within an hour of service deactivation unless you re-enable the API. If you re-enable the API, you are billed for the service usage that occurs while the Cloud Composer service is deactivating.
Maybe this is poorly worded, but to me it sounds like it would become unusable within an hour if you disable it, but if you re-enabled it anytime later, it will become usable again. I am wondering if it really means that if you disable it, you must re-enable it within 1 hour or it will become permanently unusable.
Is there a way to disable the composer.googleapis.com service for longer than an hour and then get it working again after the service has been re-enabled? Is there something I can restart, or some way to clear the error state? Is there more I should do before disabling it?
I am using composer-1.10.4-airflow-1.10.6 with Python 3.
Thanks.
No, there is no way to disable the composer.googleapis.com service for more than an hour and then have environments be functional after re-enablement.
GCP services are not meant to be enabled/disabled on the fly in this manner, and disablement of a service is meant to be performed with the intention of disabling it for the long term. Keeping a service disabled for long enough means Google-managed components created for the service (specifically for your project) will be decommissioned, and in Composer's case, this will render your environments permanently unusable.
The error state in the environment cannot be cleared. If you want to save on costs, you should delete Composer environments as opposed to deactivating the service entirely. The "service" is not cluster-like and isn't meant to be toggled on and off.
I have a use case where I schedule a task 24h into the future after an event occurs. This task represents some sort of "deadline" for other things to happen.
The scheduled task triggers a creation of a report. If not all of the above mentioned "other things" have completed by this time, then the triggered report creation process creates it anyways with the information it has at the time.
If, on the other hand, all other things do complete before these 24h, then ideally I'd like to re-use the same Google Cloud Task to trigger the same process (as it's identical as the previous case but will contain all of the information possible).
I would imagine the easiest way to achieve the above is to:
schedule a task 24h into the future
if all information arrives: run the task early before it's scheduled time
However, reading through the Google Cloud Tasks documentation I don't see the option to run the task early. However, that feature does exist on the Cloud Tasks console, so I was wondering if it is available in the documentation and client libraries.
Thanks!
This is probably what you're looking for
https://cloud.google.com/tasks/docs/reference/rest/v2/projects.locations.queues.tasks/run
NOTE: It does say however that "This command is meant to be used for manual debugging"
i want to schedule a job in SAS-DIS. i tried the process using sas management console,bt an error is popping up saying scheluing server not found.
can anyone help me how to setup a scheduling server? or is it a software to be installed?
Thanks
I think a scheduling server is an extra package that has to be purchased. Our BI setup is lacking that option and no matter what we can't seem to get it approved. Check with your SAS server admin to see if the job scheduling has been enabled. If so he/she should be able to tell you the process for getting it scheduled.
Alternatively, without a scheduling server you still deploy your jobs and can either use
1. Cron and Crontab (in Unix or Linux)
2. Windows OS scheduler
to schedule jobs manually as this is the best option available if there is none. I know this can be very tedious and cumbersome , but can give it a try if you have less number of jobs to schedule.