I am new to google cloud composer. I have some code in google cloud compute engine -
for eg: test.py
Currently I am using Jenkins as my scheduler - and I'm running the code like below
echo "cd /home/user/src/digital_platform &&/home/user/venvs/bdp/bin/python -m test.test.test" | ssh user#instance-dp
I want to run the same code from google cloud composer.
How I can do that..
Basically I need to ssh to an instance in google cloud and run the code in an automated way using google cloud composer.
It seems that SSHOperator might be something that might work for you. This operator is an Airflow feature, not Cloud Composer feature per se.
The other operator that you might want to take a look at before making your final decision is BaskOperator
You need to create a DAG (workflows), Cloud Composer schedules only the DAGs that are in the DAGs folder in the environment's Cloud Storage bucket. Each Cloud Composer environment has a web server that runs the Airflow web interface that you can use to manage DAGs.
Bash Operator is useful to run command-line programs. I suggest you follow the Cloud Composer Quickstart which shows you how to create a Cloud Composer environment in the Google Cloud Console and run a simple Apache Airflow DAG.
Related
I would like to trigger a python script on Cloud runtime from Cloud function. I am able to start the Cloud VM from cloud function using client library ,however i am not able to run 'gcloud ssh' command from cloud function.
Is there a way to achieve this? One option is to use the python script as the VM Start up script but I am trying not to do that .
For instance, I want to run the following gcloud CLI command,
gcloud run services delete [SERVICE]
But, from a triggered Google Cloud Function.
I've looked in a few places and have found a few similar things,
https://www.googlecloudcommunity.com/gc/Infrastructure-Compute-Storage/Automatic-Resource-Deletion/m-p/172865
https://github.com/PolideaInternal/cats-love-money
Create a Google function from a Google cloud function
But, I find them a bit tricky to follow and/or replicate.
The Google Cloud CLI is a Python program. That means a lot of dependencies and a requirement for a shell and OS environment. Cloud Functions does not provide either.
A better option for running the CLI is Cloud Run. This provides the additional benefit of being able to test the Cloud Run container locally. You will need to wrap the CLI with an HTTP server responding to HTTP requests which then execute the CLI.
However, most CLI commands can be easily duplicated with the Python SDKs and/or direct REST API calls which are supported by Cloud Functions. This will require a solid understanding of the services, APIs, and language.
I used Deployment Manager to create a LAMP Stack for phpMyAdmin. Is it possible to access files on the VM from the Google Cloud Shell? If so, how would I navigate to the files in Google Cloud Shell?
When you start Cloud Shell, it provisions a Google Compute Engine virtual machine running a Debian-based Linux operating system. Cloud Shell instances are provisioned on a per-user, per-session basis. The instance persists while your Cloud Shell session is active; after an hour of inactivity, your session terminates and its VM, discarded. For more on usage quotas, refer to the limitations guide 1.
Yes you can access your LAMP VM-Instance using the cloud shell command as shown below:
gcloud beta compute ssh --zone "us-central1-a" "vm-name" --project "project-id".
Note: (Please replace the zone , vm-name and project-id as per your naming conventions).
Please follow the link 2 to get more information on Cloud Shell How to Guides.
I've been tasked with automating the scheduling of some notebooks that are run daily that are on AI Platform notebooks via the Papermill operator, but actually doing this through Cloud Composer is giving me some troubles.
Any help is appreciated!
First step is to create Jupyter Lab Notebook. If you want to use additional libraries, install them and restart the kernel (Restart Kernel and Clear All Outputs option). Then, define the processing inside your Notebook.
When it's ready, remove all the runs, peeks and dry runs before you start the scheduling phase.
Now, you need to set up Cloud Composer environment (remember about installing additional packages, that you defined in first step). To schedule workflow, go to Jupyter Lab and create second notebook which generates DAG from workflow.
The final step is to upload the zipped workflow to the Cloud Composer DAGs folder. You can manage your workflow using Airflow UI.
I recommend you to take a look for this article.
Another solution that you can use is Kubeflow, which aims to make running ML workloads on Kubernetes. Kubeflow adds some resources to your cluster to assist with a variety of tasks, including training and serving models and running Jupyter Notebooks. You can find interesting tutorial on codelabs.
I hope you find the above pieces of information useful.
This blog post on Medium, "How to Deploy and Schedule Jupyter Notebook on Google Cloud Platform", describes how to run Jupyter notebook jobs on a Compute Engine Instance and schedule it using GCP's Cloud Scheduler > Cloud Pub/Sub > Cloud Functions. (Unfortunately the post may be paywalled.)
If you must use Cloud Composer, then you might find this answer to related question, "ETL in Airflow aided by Jupyter Notebooks and Papermill," useful.
I wrote a small plugin for Apache Airflow, which runs fine on my local deployment. However, when I use Google Composer, the user interface hangs and becomes unresponsive. Is there any way to restart the webserver in Google Composer
(Note: This answer is currently more suggestive than finalized.)
As far as restarting the webserver goes...
What doesn't work:
I reviewed Airflow Web Interface in the docs which describes using the webserver but not accessing it from a CLI or restarting.
While you can also run Airflow CLI commands on Composer, I don't see a command for restarting the webserver in the Airflow CLI today.
I checked the gcloud CLI in the Google Cloud SDK but didn't find a restart related command.
Here are a few ideas that may work for restarting the Airflow webserver on Composer:
In the gcloud CLI, there's an update command to change environment properties. I would assume that it restarts the scheduler and webserver (in new containers) after you change one of these to apply the new setting. You could set an arbitrary environment variable to check, but just running the update command with no changes may work.
gcloud beta composer environments update ...
Alternatively, you can update environment properties excluding environment variables in the GCP Console.
I think re-running the import plugins command would cause a scheduler/webserver restart as well.
gcloud beta composer environments storage plugins import ...
In a more advanced setup, Composer supports deploying a self-managed Airflow web server. Following the linked guide, you can: connect into your Composer instance's GKE cluster, create deployment and service Kubernetes configuration files for the webserver, and deploy both with kubectl create. Then you could run a kubectl replace or kubectl delete on the pod to trigger a fresh start.
This all feels like a bit much, so hopefully documentation or a simpler way to achieve webserver restarts emerges to succeed these workarounds.