Using Cloud Functions as operators in a GC Composer DAG - google-cloud-platform

Fellow coders,
For a project I'm interested in using Google Cloud Composer to handle several workflows that consist of operations that can be shared between workflows.
It seems to me that Cloud Functions are a perfect way of performing tasks as these operations in a Composer DAG.
For what I understood of it, I would need an operator that invokes a cloud function with data that is specific for the task in the specific DAG.
I found a Google Cloud Function operator in the Airflow documentation, however, these are only for deploying and deleting cloud functions, but not for invoking them.
A lot has been written about invoking DAGs from a cloud function, but nothing seems to be written about using cloud functions as operations within a DAG.
Example use case:
Every time a document is placed in a certain bucket I want to start a DAG workflow to analyse this document.
This DAG can consist of various tasks, such as extraction of the sender of the document, classification of a logo, or searching for specific words. For these separate tasks, I want to create separate cloud functions that are stitched together in a DAG to compose my workflows.
Question:
How to invoke cloud functions from within a Google Composer DAG?
Do people have experience with this or have sample code available?
Thanks in advance.

HTTP Triggers can be used to run Cloud Functions, so you can invoke them from a DAG using the HTTP operator. The DAG runs the task, that calls the Cloud Function trigger, than runs the function.

Related

Creating dynamically scheduled functions

I have a list of 10 time stamps which keeps on updating dynamically. In total there are 3 such lists for 3 users. I want to build a utility to trigger a function at the next upcoming time stamp. (preferably everything over server-less compute)
I am stuck in finding out how to achieve this over aws or firebase
On Firebase/Google Cloud Functions the two most common options are either to store the schedule in a database and then periodically trigger a Cloud Function and run the tasks that are due, or to use Cloud Tasks to dynamically schedule a callback to a separate Cloud Function for each task.
I recommend also reading:
Doug's blog post on How to schedule a Cloud Function to run in the future with Cloud Tasks (to build a Firestore document TTL)
Fireship.io's tutorial on Dynamic Scheduled Background Jobs in Firebase
How can scheduled Firebase Cloud Messaging notifications be made outside of the Firebase Console?
Previous questions on dynamically scheduling functions, as this has been covered quite well before.
Update (late 2022): there is now also a built-in way to schedule Cloud Functions dynamically: enqueue functions with Cloud Tasks.

Is it possible to do micro-batching in serverless platforms like cloud run?

I heavily use Google cloud run, for many reasons - one of the reasons is the simplicity of treating each request as stateless and handling it individually.
However I was thinking recently that for a service we have which simply writes data to a DB, it would be very handy to batch a few requests rather than write each one individually. Is this possible via serverless platforms - specifically cloud run?
Because Cloud Run is stateless, you can't stack the requests (mean keep them, so statefull) and process them later on. You need an intermediary layer for that.
On good way, that I have already implemented, is to publish the request in PubSub (either directly, or you use a CLoud Run/Cloud Function to get the request and transform it in PubSub message).
Then, you can create a Cloud Scheduler, that trigger a Cloud Run service. This Cloud Run will pull the PubSub topic and read a bunch of messages (maybe all). And then, you have all the "request" in batch and you can process them "inside the Cloud Scheduler request" (don't forget that you can't process in background with Cloud Run, you must be in a request context. -> for now ;) )
I think you can give a try to these blogs, I've done some reading and looks like you can pull some good ideas from them.
Running a serverless batch workload on GCP with Cloud Scheduler, Cloud Functions, and Compute Engine
Batching Jobs in GCP Using the Cloud Scheduler and Functions
Here is another stackoverflow thread that shows some similar approach.

Amazon Systems Manager alternative on GCP

Is there a solution/service available on GCP in similar lines of Systems Manager?
My end goal is to run a shell script on GCP VM on specific events.
Like for AWS, via EventBridge I was able to trigger a Lambda Function and the function in turn triggered a SSM command for specific VM.
Is this possible on GCP?
There isn't a Systems Manager equivalent in GCP.
A Pub/Sub subscription from the VMs/compute units which triggers a lambda function (cloud function in GCP) is a suboptimal solution and different from what Systems Manager accomplishes..
I don't know what kind of events you have in mind that would trigger running a script but you can check out the tutorial how to run a function using pub/sub. It shows how to use scheduler based events but it's possible to use not-scheduled triggers;
Events are things that happen within your cloud environment that you might want to take action on. These might be changes to data in a database, files added to a storage system, or a new virtual machine instance being created. Currently, Cloud Functions supports events from the following providers:
HTTP
Cloud Storage
Cloud Pub/Sub
Cloud Firestore
Firebase (Realtime Database, Storage, Analytics, Auth)
Stackdriver Logging—forward log entries to a Pub/Sub topic by creating a sink. You can then trigger the function.
And here you can read on how to implement those triggers.
For example this documentation explains how to use storage based triggers in pub/sub.
If you provide more details of what exactly you want to achieve (what events have to trigger what) then I can point you to a more direct solution.
The approach depends on the exact use case you have in hand. One of the common architecture option could be using pub/sub with cloud functions. Based on messages published to Pub/Sub topics, cloud functions performing operations of our interest can be triggered/ invoked in the same cloud project as the function.

AWS Step Functions - is there such a solution for Google Cloud Platform?

At the moment I am investigating the possibility and the proper way of migrating complex web applications from AWS to GCP. There is actually no issues with mapping general compute and networking services from one provider to another, but I wonder if GCP has a service similar to AWS Step Functions? I've already taken a look at Google Dataflow and Google Cloud Tasks. The second one seems to be something like that, but I am not sure if it's the optimal solution.
So the question is what service from google provides same functionality as AWS Step Functions? And if there is no such - then combination of which services would you recommend to achieve effective orchestration of distributed tasks (primarily cloud functions).
Thanks!
2021 Update
As Brian de Alwis noted below, since this answer was written Cloud Workflows is now generally available and is functionally similar to Step Functions.
2019 Answer
As far as I'm aware there's nothing specifically like Step Functions, but I have two strategies for creating these types of micro-service systems on Google Cloud.
Strategy 1: Cloud Run/Cloud Functions with Pub/Sub
Here I'd create microservices using Cloud Run or Cloud Functions and subscribe these functions to Pub/Sub topics. That means that when Function A executes and completes it's work, it publishes a message to a specific topic with a data packet that any function subscribed to it will receive and execute.
For example you could create two topics named FunctionASuccess and FunctionAError and create two separate functions that subscribe to one or the other and handle the success and error use cases.
Strategy 2: Firebase Functions with Firestore/Realtime Database
Similarly to above I create Firebase Functions that watch for changes in Firestore or in the RTDB.
So Function A executes and completes its task, it saves a document to the FunctionAResults collection in Firestore or RTDB. Functions that are subscribed to changes in the FunctionAResults collection are then executed and take it to the next step.
They both work reliably so I have no preference, but I typically go with the 2nd strategy if I'm utilizing other Firebase services.
Cloud Workflows was announced at Cloud Next On Air 2020.
You're looking for Cloud Composer. It's based on the open-source library Apache Airflow which allows you to define and orchestrate workflows in a similar way to step functions.

Using google_cloud_scheduler_job to schedule batch jobs

I'm trying to schedule a batch job using google_cloud_scheduler_job terraform resource.
As per the document https://www.terraform.io/docs/providers/google/r/cloud_scheduler_job.html, I see only the following options:
PubSub target
HTTP target
AppEngine target
Any suggestions on how to create a batch job scheduler using google_cloud_scheduler_job? Thanks.
Let us split the story into two parts. Let us assume a function ... that when called, will initiate your batch job. You can write this function in a variety of programming languages .. in this example, we'll assume Node. In your Node function, you could (for example) call the DataProc Node.js sumitJob function to instantiate a DataProc job.
Now the question changes from "How do I schedule the execution of my batch job" to "How do I schedule the execution of a function (which executes the batch job)". And here is where the combination of Google Cloud Scheduler and Google Cloud Functions comes into play. Google Cloud Functions allows you to write a code function which is externally triggered by an arriving event. Such an event could be an HTTP request (as WebHook) or a Pub/Sub message. And where can these events come from? The answer is Google Cloud Scheduler. Once you have created your function, you can define that the function be executed/triggered on a schedule. And the result of all of this appears to be your desired request.
A tutorial illustrating Cloud Scheduler and Cloud Functions interacting can be found here.