Cloud Composer - DAG Task Log File is not Found - google-cloud-platform

Since a few days ago some tasks throw an error at the start of every DAG run. It seems the log file is not found to retrieve the logging from the task.
*** 404 GET https://storage.googleapis.com/download/storage/v1/b/europe-west1-ventis-brand-f-65ab79d1-bucket/o/logs%2Fimport_ebay_snapshot_feeds_ES%2Fstart%2F2021-11-30T08%3A00%3A00%2B00%3A00%2F3.log?alt=media: No such object: europe-west1-ventis-brand-f-65ab79d1-bucket/logs/import_ebay_snapshot_feeds_ES/start/2021-11-30T08:00:00+00:00/3.log: ('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)
I've upgraded to the latest version of Cloud Composer and the tasks run on Python3.
There is the environment configuration:
Resources:
Workloads configuration:
Scheduler
1 vCPU, 2 GB memory, 2 GB storage
Number of schedulers: 2
Web server:
1 vCPU, 2 GB memory, 2 GB storage
Worker
1 vCPU, 2 GB memory, 1 GB storage
Number of workers:
Autoscaling between 1 and 4 workers
Core infrastructure:
Environment size: Small
GKE cluster
projects/*********************************
There are no related issues regarding this error in Cloud Composer changelog.
How could this being fixed?

Its a bug in the Cloud composer environment and has already been reported. You can track this conversation: Re: Log not visible via Airflow web log and other similar forums. For fixing the issue, its recommended to update you composer environment or use an stable version.
There is some workaround suggested (You can try in this order, they are independent from each other):
Remove the logs from the /logs folder in Composer GCS bucket and archive it in some other place (outside of /logs folder).
or
Manually update the web server configuration to read logs directly from a new bucket in your project. You would first need to grant viewer roles (like roles/storage.legacyBucketReader and roles/storage.legacyObjectReader) on the bucket to the service account running the web server.
edit /home/airflow/gcs/airflow.cfg → remote_base_log_folder = <newbucket> with proper permissions as described above.
or
If you don't have DRS (Domain restricted sharing) enabled which I believe you don't. You can create a new Composer environment, this time through v1 Composer API or without Beta features enabled in Cloud Console. This way Composer will create an environment without the DRS-compliant setup, so without the bucket-to-bucket synchronization.
The problem is that you would need to migrate your DAGs and data to the new environment.

Related

How to find when my vm was resized in GCP?

I created a VM in GCP with a 2 core 8GB Ram config, later i noticed it was changed to 4 core and 16 GB Ram. I need to find out who did this and when from my team.
I tried going through the activity dashboard but its quiet difficult to understand from that. Can anyone provide a solution to this?
Changes to a Compute Engine configuration will be logged in the Admin Activity audit logs. The IAM identity that changed the instance will be logged.
The following CLI command will read the log. Replace PROJECT_ID with your Project ID.
gcloud logging read "logName : projects/PROJECT_ID/logs/cloudaudit.googleapis.com" --project=PROJECT_ID
Understanding audit logs
Compute Engine audit logging information
gcloud logging read

Atlantis plan erroring with querying Cloud Storage failed message

I have a GCP VM to which a GCP Service Account has been attached.
This SA has the appropriate permissions to perform some terraform / terragrunt related actions, such as querying the backend configuration GCS bucket etc.
So, when I log in to the VM (to which I have already transferred my terraform configuration files, I can for example do
$ terragrunt plan
Initializing the backend...
Successfully configured the backend "gcs"! Terraform will automatically
use this backend unless the backend configuration changes.
Initializing provider plugins...
- terraform.io/builtin/terraform is built in to Terraform
- Finding hashicorp/random versions matching "3.1.0"...
- Finding hashicorp/template versions matching "2.2.0"...
- Finding hashicorp/local versions matching "2.1.0"...
.
.
.
(...and the plan goes on)
I have now set up atlantis to run as a systemd service (under a same name user)
The problem is that when I create a PR, the plan (as posted as a PR comment) fails as follows:
Initializing the backend...
Successfully configured the backend "gcs"! Terraform will automatically
use this backend unless the backend configuration changes.
Failed to get existing workspaces: querying Cloud Storage failed: storage: bucket doesn't exist
Does anyone know (suspects) whether this problem may be related to the change the terraform service account is / can not be used by the systemd service running atlantis? (cause the bucket is there, since I am able to plan manually)
update: I have validated that a systemd service does inherit the GCP SA by creating a systemd service that just runs this script
#!/bin/bash
gcloud auth list
and this does output the SA of the VM.
So I changed my original question since this apparently is not the issue.
Posting my comment as an answer for visibility to other community members.
You were maybe getting an error because there can be an issue with the terraform configuration. To update it, Please run the following command and see if it solves your issue.
terraform init -reconfigure

Why can Cloud Composer 2 not connect to Redis during environment generation?

I have tried multiple times to create a new Cloud Composer 2 environment. While creating a Cloud Composer 1 environment worked completely fine, this keeps on failing. From what I can see in the logs it appears that airflow-worker cannot connect to Redis. I already made sure the Cloud Composer service account has sufficient rights. What else could be the cause of this?

How to use multiple DagBags in Airflow with WebUI

We are running Airflow 1.10.3 via Google Cloud Composer.
Our dags are distributed over several folders that we collect via instances of DagBag (like here https://medium.com/#xnuinside/how-to-load-use-several-dag-folders-airflow-dagbags-b93e4ef4663c)
However the WebUI apparently can't find any DAGs that are not in the main dag folder (the one configured in airflow.cfg)
This seems to be because in airflow.www.views there is only one global variable dagbag.
Is that really the problem? What could be a workaround?
Additional info:
airflow list_dags shows all dags
the dags are also listed in the WebUI, and seem to get scheduled, but clicking the dag in the WebUI does only yields the error " does not seem to be in dagbag"
I'm curious to hear about your thoughts, since I'm pretty lost here.
According to the Cloud Composer documentation the dags_folder parameter is blocked and can't be overridden (You're only allowed to use the the GCS bucket created by the Cloud Composer Environment). This allows Cloud Composer to upload DAGs, and that the dags folder remains in the Google Cloud Storage bucket.
Due to the DAGBag can't be modified, and because Apache Airflow does not provide strong DAG isolation, it's recommend that you maintain separate your DAGs in different environments to prevent DAG interference.
I've made some test in my Composer environment creating multiple folders to separate my DAGs:
In all cases my Dags were recognized and ran as expected, even for sub-folders:
$ gcloud beta composer environments storage dags list --environment=$ENVIRONMENT --location=us-east1
NAME
dags/
dags/airflow_monitoring.py
dags/dev/
dags/dev/airflow_monitoring_dev.py
dags/qa/
dags/qa/airflow_monitoring_qa.py
dags/qa/qa_test1/
dags/qa/qa_test1/airflow_monitoring_qa_test1.py
If recreate your folders in the dag folder created by Composer is not feasible for you. I recommend you synchronize content of your own bucket with the Composer dags folder, with the rsync command you could mirror both buckets.

How to increase gitlab job concurrency on gke kubernetes runner?

I created two gitlab runners through the standard default runner creation UI in gitlab (3 node n1-standard-4 gke cluster). I've been trying to increase my gitlab runner to handle more than the default 4 concurrent jobs, but for some reason the limit is still capped at running only 4 jobs at once.
In GCP I changed the concurrent value in the config.toml from 4 to 20 under the config maps runner-gitlab-runner and values-content-configuration-runner that were generated in my cluster found under the https://console.cloud.google.com/kubernetes/config menu.
What else do I need to change to allow my gitlab runners to run more than 4 jobs at once?
Do I need to change the limit in the runner options? If so, where would I find that config in GCP?
Changing the config maps in GCP will not immediately update the cluster. You need to manually reload the deployment with:
kubectl rollout restart deployment/runner-gitlab-runner -n gitlab-managed-apps
There is also a button in the GCP cluster menu that will pull up a terminal with the kubeconfig of your cluster in a browser window.