I am monitoring dag runs in cloud composer using Cloud logging. It has airflow task run shows logs most of the times but once in awhile some logs are missing in cloud logging side. I check the Airflow and task logs are there but not in Cloud logging. It happens in TriggerDagRunOperator specifically. Some of them have full logs but some don't so it is random. Is this bug or I am missing something.
Log query:
resource.type="cloud_composer_environment" AND resource.labels.environment_name=sddasdadad AND log_name="projects/pd-sdesdsdsaed/logs/airflow-worker" AND labels.workflow="dadAsdadsadAsdASd" AND labels.execution-date = "2023-01-05T17:00:00+00:00"
I expect to see composer logs on Cloud logging side
Related
AWS MWAA is configured for remote logging to a given S3 path.
While running tasks(preferably long and time-consuming) in any DAG, we can not see logs either on Airflow UI or the S3 path.
Whenever a task ends(success or failure), we are able to see logs at both places.
Is there any way to see logs while running tasks in case of remote logging?
Anybody knows why is this happening? Which api composer calls to calculate health? What is this events?
GCP cloud composer 2 shows unhealthy but everything works. Seems like the service account cannot access an API to calculate health metrics.
Composer version composer-2.0.25-airflow-2.2.5
"Error from server (Forbidden): events is forbidden: User "xxxxx.iam.gserviceaccount.com'
cannot list resource "events" in API group "" in the namespace airflow-2-2-5-11111111":
GKEAutopilot authz: the request was sent before policy enforcement is enabled" timestamp : '2022-10-07T15:48:08.834865950Z"
composer
The environment health metric depends on a Composer-managed DAG named airflow_monitoring which is triggered periodically by the airflow-monitoring pod.
The airflow_monitoring DAG is a per-environment liveness prober/healthcheck that is used to populate the Cloud Composer monitoring metric environment/healthy. It is an indicator for the general overall health of your environment, or more specifically, its ability to schedule DAGs and run tasks. This allows you to use Google Cloud Monitoring features such as metric graphs, or setting alerts when your environment becomes unhealthy.
If this DAG isn't deleted, you can check the airflow-monitoring logs to see if there are any problems related to reading the DAG's run statuses. Consequently, you can also try troubleshooting the error in Cloud Logging using the filter:
resource.type="cloud_composer_environment"
severity=ERROR
You can find more information about the metric on the GCP Metrics List, and can explore the metric in Cloud Monitoring.
Also, you can refer to this documentation for information on Cloud Composer’s environment health metric.
I am trying to build a python cloud run service that would be triggered whenever a file is uploaded in a google cloud storage bucket. Although, when I see the logs, the service is not triggered while I have created an Eventarc trigger for the same, already. I cannot find any entries in the cloud run service logs, but the trigger tab shows an Eventarc trigger associated with it.[![Cloud Run Trigger Image][1]][1]
[![Cloud Run Logs][2]][2]
Any ideas or links that can help me here?
[1]: https://i.stack.imgur.com/ijjh2.png
[2]: https://i.stack.imgur.com/QhFhk.png
In your logs, the line
booting worker with pid: 4
indicates, that your cloud run instance did indeed got triggered, but might have failed to boot, because there is no further log output.
To debug, deploy a demo cloud run function that just logs the incoming message. Thus, it will be easier to see whether it has been triggered (and with what payload).
There is an easy Tutorial from Google along these lines.
I am running a python script in Cloud Run on a daily basis with Cloud Scheduler to pull data from BigQuery and upload it to Google Cloud Storage as a CSV file. The Cloud Scheduler setup utilizes an HTTP "Target" with a GET "HTTP method". Also, Cloud Scheduler authenticates the https endpoint using a service account with the "Add OIDC token" option.
When running Cloud Scheduler and Cloud Run with a very small subset of the BigQuery data for a job that takes a few seconds, the "Result" in Cloud Scheduler always shows "Success" and the job completes as intended. However, when running Cloud Scheduler and Cloud Run with the full BigQuery dataset for a job that takes a few minutes, the "Result" in Cloud Scheduler always shows "Failed", even though the CSV file is typically (although not always) uploaded into Google Cloud Storage as intended.
(1) When running Cloud Scheduler and Cloud Run on the full BigQuery dataset, why does the "Result" in Cloud Scheduler always show "Failed", even though the job is typically finishing as intended?
(2) How can I fix Cloud Scheduler and Cloud Run to ensure the job always completes as intended and the "Result" in Cloud Scheduler always shows "Success"?
It's a common mistake with Cloud Scheduler. I rose it many times to Google but it nothing as changed until now...
The GUI (the web console) doesn't allow you to configure anything, especially the timeout. Your Cloud Scheduler fails because it considers that it doesn't receive the answer in time when you scan your full BQ dataset (that can take few minutes)
For solving this, use the command line (gcloud), especially the attempt-deadline parameter. You can have a look to other params: retry, backoff,... The allowed customization is interesting, but not present in the GUI!
How can I monitor the Airflow web server when using Google Cloud Composer? If the web server goes down or crashes due to an error, I would like to receive an alert.
You can use Stackdriver Monitoring: https://cloud.google.com/composer/docs/how-to/managing/monitoring-environments. Alerts can also be set in Stackdriver.
At this time, fine-grained metrics for the Airflow web server are not exported to Stackdriver, so it cannot be monitored like other resources in a Cloud Composer environment (such as the GKE cluster, GCE instances, etc). This is because the web server runs in a tenant project alongside your main project that most of your environment's resources live in.
However, web server logs for Airflow in Composer are now visible in Stackdriver as of March 11, 2019. That means for the time being, you can configure logs-based metrics for the web server log (matching on lines that contain Traceback, etc).