As Google Cloud Composer uses Cloud Storage to store Apache Airflow DAGs. However, where the operators are stored ? I am getting an error as below:
Broken DAG: [/home/airflow/gcs/dags/example_pubsub_flow.py] cannot import name PubSubSubscriptionCreateOperator.
This operator was added in Airflow 1.10.0 . As of today, Cloud Composer is still using Airflow 1.9.0, hence this operator is not available yet. You can add this as a plugin.
apparently, according to the following post in this message in the Composer Google Group list, to install as a plugin the contrib is not needed to add the Plugin boilerplate.
It is enough with registering the plugins via this command:
gcloud beta composer environments storage plugins import --environment dw --location us-central1 --source=custom_operators.py
See here for detail.
The drawback is that if your contrib operator uses others you will have to copy also those and modify the way they are imported in python, using:
from my_custom_operator import MyCustomOperator
instead of:
from airflow.contrib.operators.my_custom_operator import MyCustomOperator
Related
I want to launch a Spark job on EMR Serverless from Airflow. I want to use Spark 3.3.0 and Scala 2.13 but the 6.9.0 EMR Release ships with Scala 2.12. I created a FAT jar including all Spark dependencies and it won't work either. As an alternative, I am trying to use an EMR custom image by creating an application using --image-configuration with the Airflow operator but it won't just pass through all the arguments from the boto API.
create_app = EmrServerlessCreateApplicationOperator(
task_id="create_my_app",
job_type="SPARK",
release_label="emr-6.9.0",
config={"name": "data-ingestion",
"imageConfiguration": {
"imageUri": "xxxxxxx.dkr.ecr.eu-west-1.amazonaws.com/emr-custom-image:0.0.1"}})
Airflow gives the following error message:
Unknown parameter in input: "imageConfiguration", must be one of:
name, releaseLabel, type, clientToken, initialCapacity, maximumCapacity, tags, autoStartConfiguration, autoStopConfiguration, networkConfiguration
This other config won't work either:
config={"name": "data-ingestion",
"imageUri": "xxxxxxx.dkr.ecr.eu-west-1.amazonaws.com/emr-custom-image:0.0.1"})
Does anybody have any ideas other than downgrading my Scala version?
Airflow operator passes the argument to the boto3 client, and this client create the application.
The configuration imageConfiguration is added to boto3 client in 1.26.44 (PR), and the other configuration are added in different version (please check the changelog).
So you can try to upgrade the version of boto3 in you Airflow server, provided that it is compatible with the others dependencies, and if not, you may need to upgrade your Airflow version.
I am working on a migration task, which involves copying variables from one Cloud Composer version to another Cloud Composer version using Cloud Shell.
Though, is it possible to accomplish such a task in Cloud Shell?
I have read the Airflow and Composer documentation; though, I cannot seem to find a working command that will allow me to copy variables from one Composer version to another.
There isn't a single Airflow CLI command to "move" Variables from one Airflow environment to another; however, you can export Variables from source environment to a file and then import from the same file to the target environment.
Something like this:
Export from source environment
gcloud composer environments run SOURCE_ENVIRONMENT_NAME \
--location SOURCE_LOCATION \
variables export \
my_file.json
Import to target environment
gcloud composer environments run TARGET_ENVIRONMENT_NAME \
--location TARGET_LOCATION \
variables import \
my_file.json
I have Airflow running in AWS MWAA, I would like to access REST API and there are 2 ways to do this but doesn't seem to work for me.
Overriding api.auth_backend. This used to work and now AWS MWAA won't allow you to add this, it is consider as 'blocklist' and not allow.
api.auth_backend = airflow.api.auth.backend.default
Using MWAA Cli(Python). This doesn't work if any of the DAGs uses packages that are in requirments.txt file.
a. as an example, I have "paramiko" in requirements.txt because I have a task that uses SSHOperator. The MWAA Cli fails with "no module paramiko"
b. Also noted here, https://docs.aws.amazon.com/mwaa/latest/userguide/access-airflow-ui.html
"Any command that parses a DAG (such as list_dags, backfill) will fail if the DAG uses plugins that depend on packages that are installed through requirements.txt."
We are using MWAA 2.0.2 and managed to use Airflow's Rest-API through MWAA CLI, basically following the instructions and sample codes of the Apache Airflow CLI command reference. You'll notice that not all Rest-API calls are supported, but many of them are (even when you have a requirements.txt in place).
Also have a look at AWS sample codes on GitHub.
How can I import a json file into Google Cloud Composer using command line?
I tried the below command
gcloud composer environments run comp-env --location=us-central1 variables -- --import composer_variables.json
I am getting the below error
[2019-01-17 13:34:54,003] {configuration.py:389} INFO - Reading the config from /etc/airflow/airflow.cfg
[2019-01-17 13:34:54,117] {app.py:44} WARNING - Using default Composer Environment Variables. Overrides have not been applied.
Missing variables file.
But when I set a single variable using below command it works fine.
gcloud composer environments run comp-env --location=us-central1 variables -- --set variable_name variable_value
Since I have more than 75 variables to be imported, we need to import it using json file. Please help me to resolve this issue
The follow command gcloud composer environments run {environment-name} variables -- --i {path-to-json-file} executes airflow variables remotely inside the Airflow containes. Hence the json file needs to be accessible within the Airflow worker/scheduler pod. So you'll need to copy your var.json to GCS first and then run the command. For example:
gcloud composer environments storage data import --source=your-var.json --environment={environment-name} --location={location}
gcloud composer environments run {environment-name} --location={location} variables -- --i /home/airflow/gcs/data/your-var.json.
We recently met a known issue on airflow:
Airflow "This DAG isnt available in the webserver DagBag object "
Now we used a temporary solution to restart whole environment by changing configurations but this is not an efficient method.
The best workaround now we think is to restart webservers on cloud composer, but we didn't find any command to restart webserver. Is it a possible action?
Thanks!
For those who wander and find this thread: currently the for versions >= 1.13.1 Composer has a preview for a web server restart
Only certain types of updates will cause the webserver container to be restarted, like adding, removing, or upgrading one of the PyPI packages or like changing an Airflow setting.
You can do for example:
# Set some arbitrary Airflow config value to force a webserver rebuild.
gcloud composer environments update ${ENVIRONMENT_NAME} \
--location=${ENV_LOCATION} \
--update-airflow-configs=dummy=true
# Remove the previously set config value.
gcloud composer environments update ${ENVIRONMENT_NAME} \
--location=${ENV_LOCATION} \
--remove-airflow-configs=dummy
From Google Cloud Docs:
gcloud beta composer environments restart-web-server ENVIRONMENT_NAME --location=LOCATION
I finally found an alternative solution!
Based on this document:
https://cloud.google.com/composer/docs/how-to/managing/deploy-webserver
We can build airflow webservers on kubernetes (Yes, please throw away built-in webserver). So we can kill webserver pods to force restart =)
From console dags can be retrieved, we can list all dag present. There other commands too.