Import variables using json file in Google Cloud Composer - google-cloud-platform

How can I import a json file into Google Cloud Composer using command line?
I tried the below command
gcloud composer environments run comp-env --location=us-central1 variables -- --import composer_variables.json
I am getting the below error
[2019-01-17 13:34:54,003] {configuration.py:389} INFO - Reading the config from /etc/airflow/airflow.cfg
[2019-01-17 13:34:54,117] {app.py:44} WARNING - Using default Composer Environment Variables. Overrides have not been applied.
Missing variables file.
But when I set a single variable using below command it works fine.
gcloud composer environments run comp-env --location=us-central1 variables -- --set variable_name variable_value
Since I have more than 75 variables to be imported, we need to import it using json file. Please help me to resolve this issue

The follow command gcloud composer environments run {environment-name} variables -- --i {path-to-json-file} executes airflow variables remotely inside the Airflow containes. Hence the json file needs to be accessible within the Airflow worker/scheduler pod. So you'll need to copy your var.json to GCS first and then run the command. For example:
gcloud composer environments storage data import --source=your-var.json --environment={environment-name} --location={location}
gcloud composer environments run {environment-name} --location={location} variables -- --i /home/airflow/gcs/data/your-var.json.

Related

GCP Serverless pyspark : Illegal character in path at index

I'm trying to run a simple hello world python code on Serverless pyspark on GCP using gcloud (from local windows machine).
if __name__ == '__main__':
print("Hello")
This always results in the error
=========== Cloud Dataproc Agent Error ===========
java.lang.IllegalArgumentException: Illegal character in path at index 38: gs://my-bucket/dependencies\hello.py
at java.base/java.net.URI.create(URI.java:883)
at com.google.cloud.hadoop.services.agent.job.handler.AbstractJobHandler.registerResourceForDownload(AbstractJobHandler.java:592)
The gcloud command:
gcloud dataproc batches submit pyspark hello.py --batch=hello-batch-5 --deps-bucket=my-bucket --region=us-central1
On further analysis, I found that gcloud puts hello.py file in dependencies\hello.py under folder {deps-bucket} and Java considers backward slash '\' as illegal.
Has anyone encountered a similar situation?
As #Ronak mentioned, Can you double check the bucket name ? I have replicated your task, and simply copied your code to my Google Cloud shell. and it ran just fine. for your next run can you delete the dependencies folder and run the batch job again ?
See my replication here:
Dependencies path created after running the job:

Airflow: Is there a way to copy variables from one Composer version to another Composer version using Cloud Shell?

I am working on a migration task, which involves copying variables from one Cloud Composer version to another Cloud Composer version using Cloud Shell.
Though, is it possible to accomplish such a task in Cloud Shell?
I have read the Airflow and Composer documentation; though, I cannot seem to find a working command that will allow me to copy variables from one Composer version to another.
There isn't a single Airflow CLI command to "move" Variables from one Airflow environment to another; however, you can export Variables from source environment to a file and then import from the same file to the target environment.
Something like this:
Export from source environment
gcloud composer environments run SOURCE_ENVIRONMENT_NAME \
--location SOURCE_LOCATION \
variables export \
my_file.json
Import to target environment
gcloud composer environments run TARGET_ENVIRONMENT_NAME \
--location TARGET_LOCATION \
variables import \
my_file.json

Check to see if a specific DAG is present or not in Composer using Cloud Shell?

I am trying to use a Cloud Shell command to check if a specific DAG is present or not (if it exists), but I am unsure what command to write in order to accomplish this.
For example, if I wanted to check to see if my_dag.py exists, I would like Cloud Shell to indicate whether it is present or not.
So far, I have used this command, but it lists all of the DAGS:
gcloud composer environments storage dags list --environment=ENVIRONMENT --location=LOCATION
I want to specifically check to see if a certain DAG exists or not using Cloud Shell, but I do not know how what command to use.
The airflow dags list command (which is what this Cloud Shell command is using) has a --subdir option where you can specify the file path to a DAG file. The output will be the DAGs tied to that fileloc.
gcloud composer environments storage dags list --environment=ENVIRONMENT --location=LOCATION --subdir dags/path/to/file.py

Google Cloud Platform: cloudshell - is there any way to "keep" gcloud init configs?

Does anyone know of a way to persist configurations done using "gcloud init" commands inside cloudshell, so they don't vanish each time you disconnect?
I figured out how to persist python pip installs using the --user
example: pip install --user pandas
But, when I create a new configuration using gcloud init, use it for a bit, close cloudshell (or cloudshell times out on me), then reconnect later, the configurations are gone.
Not a big deal, I bounce between projects/etc so it's nice to have the configs saved so I can simply run
gcloud config configurations activate config-name
Thanks...Rich Murnane
Google Cloud Shell only persists data in your $HOME directory. Commands like gcloud init modify the environment variables and store configuration files in /tmp which is deleted when the VM is restarted. The VM is terminated after being idle for 20 minutes or 60 minutes depending on which document you read.
Google Cloud Shell is a Docker container. You can modify the docker image to customize to fit your needs. This method will allow you to install packages, tools, etc that are not located in your $HOME directory.
You can also store your files and configuration scripts on Google Cloud Storage. Modify .bashrc to download your cloud files and run your configuration script.
Either method will allow you to create a persistent environment.
This StackOverflow answer covers in detail what gcloud init does and how to basically emulate the same thing via script or command line.
gcloud init details
this isn't exactly what I wanted, but since my
account (userid) isn't changing, I'm simply going to
do the command
gcloud config set project second-project-name
good enough, thanks...Rich

GCP, Composer, Airflow, Operators

As Google Cloud Composer uses Cloud Storage to store Apache Airflow DAGs. However, where the operators are stored ? I am getting an error as below:
Broken DAG: [/home/airflow/gcs/dags/example_pubsub_flow.py] cannot import name PubSubSubscriptionCreateOperator.
This operator was added in Airflow 1.10.0 . As of today, Cloud Composer is still using Airflow 1.9.0, hence this operator is not available yet. You can add this as a plugin.
apparently, according to the following post in this message in the Composer Google Group list, to install as a plugin the contrib is not needed to add the Plugin boilerplate.
It is enough with registering the plugins via this command:
gcloud beta composer environments storage plugins import --environment dw --location us-central1 --source=custom_operators.py
See here for detail.
The drawback is that if your contrib operator uses others you will have to copy also those and modify the way they are imported in python, using:
from my_custom_operator import MyCustomOperator
instead of:
from airflow.contrib.operators.my_custom_operator import MyCustomOperator