Google Cloud Storage upload triggers python app alternatives to Cloud Function - google-cloud-platform

I'd like to find out what's the best architecture for a python app that gets triggered when a file is uploaded to Google Cloud Storage, does some processing and outputs a file to Google Drive?
I've tried using Cloud Functions but I'm getting a Function invocation was interrupted. Error: memory limit exceeded. in the logs.
I've also followed this tutorial Trigger Cloud Run with events from Eventarc so I know that one way is with EventArc and Cloud Audit logs.
2 questions:
What other methods are there since I require higher memory limits?
How do I get the bucket name and file name from cloud audit logs? through protoPayload.resourceName?

You can use PubSub. You can create a PubSub notification and create a push subscription to the service that you want.
Http cloud function
App Engine
Cloud Run
Any HTTP service running somewhere (VM, Kubernetes, on prem,...)
EventArc is mainly a wrapper of this process and can call only Cloud Run service (for now)

Related

Why do we need Pub/Sub with Cloud Scheduler in GCP?

I am reading this https://cloud.google.com/scheduler/docs/tut-pub-sub
They use the setup like below:
Cloud Scheduler -> PubSub -> Cloud Function-> external Service
and If I have a cron job for calling a service once a day, should I still need this pubsub in between?
I know there is an option for HTTP target type in Cloud Scheduler and I think the below setup without PubSub is good enough.
Cloud Scheduler -> Cloud Function-> external Service
Could you give some advice why I should/should not have the PubSub?
The example that you are looking at is Using Pub/Sub to trigger a Cloud Function so it'll include examples with Pub/Sub there. Instead you can deploy a HTTP Cloud function and use it's URL as the target URL as in below screenshot:
Here, Cloud Scheduler will trigger the function without Pub/Sub.

Is it possible to do micro-batching in serverless platforms like cloud run?

I heavily use Google cloud run, for many reasons - one of the reasons is the simplicity of treating each request as stateless and handling it individually.
However I was thinking recently that for a service we have which simply writes data to a DB, it would be very handy to batch a few requests rather than write each one individually. Is this possible via serverless platforms - specifically cloud run?
Because Cloud Run is stateless, you can't stack the requests (mean keep them, so statefull) and process them later on. You need an intermediary layer for that.
On good way, that I have already implemented, is to publish the request in PubSub (either directly, or you use a CLoud Run/Cloud Function to get the request and transform it in PubSub message).
Then, you can create a Cloud Scheduler, that trigger a Cloud Run service. This Cloud Run will pull the PubSub topic and read a bunch of messages (maybe all). And then, you have all the "request" in batch and you can process them "inside the Cloud Scheduler request" (don't forget that you can't process in background with Cloud Run, you must be in a request context. -> for now ;) )
I think you can give a try to these blogs, I've done some reading and looks like you can pull some good ideas from them.
Running a serverless batch workload on GCP with Cloud Scheduler, Cloud Functions, and Compute Engine
Batching Jobs in GCP Using the Cloud Scheduler and Functions
Here is another stackoverflow thread that shows some similar approach.

How to test Google Cloud Storage Triggers Cloud Function locally?

With reference to Google Cloud Storage Triggers , I wrote a background function which gets triggered from GCS. Using GCP Function Framework for Java, is it possible to test it locally ?
You can test locally your Cloud Function triggers by running integration tests.
Integration tests should trigger and respond to actual Cloud events
such as HTTP requests, Pub/Sub messages, or Storage object changes.
Source
Here you may find more information for testing background Cloud Functions.

Amazon Systems Manager alternative on GCP

Is there a solution/service available on GCP in similar lines of Systems Manager?
My end goal is to run a shell script on GCP VM on specific events.
Like for AWS, via EventBridge I was able to trigger a Lambda Function and the function in turn triggered a SSM command for specific VM.
Is this possible on GCP?
There isn't a Systems Manager equivalent in GCP.
A Pub/Sub subscription from the VMs/compute units which triggers a lambda function (cloud function in GCP) is a suboptimal solution and different from what Systems Manager accomplishes..
I don't know what kind of events you have in mind that would trigger running a script but you can check out the tutorial how to run a function using pub/sub. It shows how to use scheduler based events but it's possible to use not-scheduled triggers;
Events are things that happen within your cloud environment that you might want to take action on. These might be changes to data in a database, files added to a storage system, or a new virtual machine instance being created. Currently, Cloud Functions supports events from the following providers:
HTTP
Cloud Storage
Cloud Pub/Sub
Cloud Firestore
Firebase (Realtime Database, Storage, Analytics, Auth)
Stackdriver Logging—forward log entries to a Pub/Sub topic by creating a sink. You can then trigger the function.
And here you can read on how to implement those triggers.
For example this documentation explains how to use storage based triggers in pub/sub.
If you provide more details of what exactly you want to achieve (what events have to trigger what) then I can point you to a more direct solution.
The approach depends on the exact use case you have in hand. One of the common architecture option could be using pub/sub with cloud functions. Based on messages published to Pub/Sub topics, cloud functions performing operations of our interest can be triggered/ invoked in the same cloud project as the function.

Create/update in datastore triggers Cloud function

I have a database in Google Datastore. I don't know how to use cloud functions, but i want to trigger an event after a creation or an update.
Unfortunately the documentation is light on the subject : https://cloud.google.com/appengine/docs/standard/java/datastore/callbacks
I don't know how i could use #PostPut to trigger an event as soon as a line is created or updated.
Does anyone have a tutorial which a basic example ?
thank you
Dan MacGrath provided an answer to a similar request (callbacks are indeed discussed below. Such solution doesn't exist yet. As a workaround, taking into account the current available triggers:
HTTP—invoke functions directly via HTTP requests.
Cloud Storage
Cloud Pub/Sub
Firebase (DB, Storage, Analytics, Auth)
Stackdriver Logging—forward log entries to a Pub/Sub topic by creating a sink. You can then trigger the function.
I would suggest a couple of solutions:
Saving something in a specific bucket from Cloud Storage every time that a line is created or updated to trigger a linked Cloud Function. You can delete the bucket contents afterwards.
Create logs with the same name and then forward them to Pub/Sub, by creating a sink.
EDIT 1
Cloud Storage triggers for Cloud Functions: Official Google doc and tutorial with a sample code in node.js 6 in Github.
Cloud Pub/Sub triggers for Cloud Functions: Official Google doc and tutorial with a sample code in node.js 6 in Github (the same than before).
Cloud Datastore does not support real-time triggers on CRUD (Create, Read, Update, Delete) events.
However, you can migrate to Cloud Firestore which does support real-time triggers for those actions (by way of Cloud Pub/Sub which can be made to invoke a Cloud Function). Cloud Firestore is the successor to Cloud Datastore and may eventually supplant it at some point in future.