Using google_cloud_scheduler_job to schedule batch jobs - google-cloud-platform

I'm trying to schedule a batch job using google_cloud_scheduler_job terraform resource.
As per the document https://www.terraform.io/docs/providers/google/r/cloud_scheduler_job.html, I see only the following options:
PubSub target
HTTP target
AppEngine target
Any suggestions on how to create a batch job scheduler using google_cloud_scheduler_job? Thanks.

Let us split the story into two parts. Let us assume a function ... that when called, will initiate your batch job. You can write this function in a variety of programming languages .. in this example, we'll assume Node. In your Node function, you could (for example) call the DataProc Node.js sumitJob function to instantiate a DataProc job.
Now the question changes from "How do I schedule the execution of my batch job" to "How do I schedule the execution of a function (which executes the batch job)". And here is where the combination of Google Cloud Scheduler and Google Cloud Functions comes into play. Google Cloud Functions allows you to write a code function which is externally triggered by an arriving event. Such an event could be an HTTP request (as WebHook) or a Pub/Sub message. And where can these events come from? The answer is Google Cloud Scheduler. Once you have created your function, you can define that the function be executed/triggered on a schedule. And the result of all of this appears to be your desired request.
A tutorial illustrating Cloud Scheduler and Cloud Functions interacting can be found here.

Related

Creating dynamically scheduled functions

I have a list of 10 time stamps which keeps on updating dynamically. In total there are 3 such lists for 3 users. I want to build a utility to trigger a function at the next upcoming time stamp. (preferably everything over server-less compute)
I am stuck in finding out how to achieve this over aws or firebase
On Firebase/Google Cloud Functions the two most common options are either to store the schedule in a database and then periodically trigger a Cloud Function and run the tasks that are due, or to use Cloud Tasks to dynamically schedule a callback to a separate Cloud Function for each task.
I recommend also reading:
Doug's blog post on How to schedule a Cloud Function to run in the future with Cloud Tasks (to build a Firestore document TTL)
Fireship.io's tutorial on Dynamic Scheduled Background Jobs in Firebase
How can scheduled Firebase Cloud Messaging notifications be made outside of the Firebase Console?
Previous questions on dynamically scheduling functions, as this has been covered quite well before.
Update (late 2022): there is now also a built-in way to schedule Cloud Functions dynamically: enqueue functions with Cloud Tasks.

Is it possible to do micro-batching in serverless platforms like cloud run?

I heavily use Google cloud run, for many reasons - one of the reasons is the simplicity of treating each request as stateless and handling it individually.
However I was thinking recently that for a service we have which simply writes data to a DB, it would be very handy to batch a few requests rather than write each one individually. Is this possible via serverless platforms - specifically cloud run?
Because Cloud Run is stateless, you can't stack the requests (mean keep them, so statefull) and process them later on. You need an intermediary layer for that.
On good way, that I have already implemented, is to publish the request in PubSub (either directly, or you use a CLoud Run/Cloud Function to get the request and transform it in PubSub message).
Then, you can create a Cloud Scheduler, that trigger a Cloud Run service. This Cloud Run will pull the PubSub topic and read a bunch of messages (maybe all). And then, you have all the "request" in batch and you can process them "inside the Cloud Scheduler request" (don't forget that you can't process in background with Cloud Run, you must be in a request context. -> for now ;) )
I think you can give a try to these blogs, I've done some reading and looks like you can pull some good ideas from them.
Running a serverless batch workload on GCP with Cloud Scheduler, Cloud Functions, and Compute Engine
Batching Jobs in GCP Using the Cloud Scheduler and Functions
Here is another stackoverflow thread that shows some similar approach.

is it possible to configure Cloud Scheduler to trigger multiple functions in one Job?

I have 2 cloud functions that run every 5 minutes, currently using two different Cloud Scheduler Jobs, is it possible to configure Cloud Scheduler to run them both at the same time using only 1 job instead of 2.
You have several options. The 2 easiest are:
With Cloud Scheduler publish a message in PubSub instead of calling a Cloud Function. Then add 2 push subscription to PubSub to call your Cloud Functions. The message in entry in the topic is duplicated in each subscription (here 2) and thus the functions are called in parallel. Note: The PubSub message format isn't exactly the same as your own specific for Cloud Functions (if you have data to POST to your function) and you need to rework on this entry point part
With Cloud Scheduler you can call Workflows and in your workflow you can call task in parallel. I wrote an article on that this week
In both cases, you can't do this out of the box and you need to use a intermediary component to perform the fan out of the only one scheduling event.

Amazon Systems Manager alternative on GCP

Is there a solution/service available on GCP in similar lines of Systems Manager?
My end goal is to run a shell script on GCP VM on specific events.
Like for AWS, via EventBridge I was able to trigger a Lambda Function and the function in turn triggered a SSM command for specific VM.
Is this possible on GCP?
There isn't a Systems Manager equivalent in GCP.
A Pub/Sub subscription from the VMs/compute units which triggers a lambda function (cloud function in GCP) is a suboptimal solution and different from what Systems Manager accomplishes..
I don't know what kind of events you have in mind that would trigger running a script but you can check out the tutorial how to run a function using pub/sub. It shows how to use scheduler based events but it's possible to use not-scheduled triggers;
Events are things that happen within your cloud environment that you might want to take action on. These might be changes to data in a database, files added to a storage system, or a new virtual machine instance being created. Currently, Cloud Functions supports events from the following providers:
HTTP
Cloud Storage
Cloud Pub/Sub
Cloud Firestore
Firebase (Realtime Database, Storage, Analytics, Auth)
Stackdriver Logging—forward log entries to a Pub/Sub topic by creating a sink. You can then trigger the function.
And here you can read on how to implement those triggers.
For example this documentation explains how to use storage based triggers in pub/sub.
If you provide more details of what exactly you want to achieve (what events have to trigger what) then I can point you to a more direct solution.
The approach depends on the exact use case you have in hand. One of the common architecture option could be using pub/sub with cloud functions. Based on messages published to Pub/Sub topics, cloud functions performing operations of our interest can be triggered/ invoked in the same cloud project as the function.

Using Cloud Functions as operators in a GC Composer DAG

Fellow coders,
For a project I'm interested in using Google Cloud Composer to handle several workflows that consist of operations that can be shared between workflows.
It seems to me that Cloud Functions are a perfect way of performing tasks as these operations in a Composer DAG.
For what I understood of it, I would need an operator that invokes a cloud function with data that is specific for the task in the specific DAG.
I found a Google Cloud Function operator in the Airflow documentation, however, these are only for deploying and deleting cloud functions, but not for invoking them.
A lot has been written about invoking DAGs from a cloud function, but nothing seems to be written about using cloud functions as operations within a DAG.
Example use case:
Every time a document is placed in a certain bucket I want to start a DAG workflow to analyse this document.
This DAG can consist of various tasks, such as extraction of the sender of the document, classification of a logo, or searching for specific words. For these separate tasks, I want to create separate cloud functions that are stitched together in a DAG to compose my workflows.
Question:
How to invoke cloud functions from within a Google Composer DAG?
Do people have experience with this or have sample code available?
Thanks in advance.
HTTP Triggers can be used to run Cloud Functions, so you can invoke them from a DAG using the HTTP operator. The DAG runs the task, that calls the Cloud Function trigger, than runs the function.