Create/update in datastore triggers Cloud function - google-cloud-platform

I have a database in Google Datastore. I don't know how to use cloud functions, but i want to trigger an event after a creation or an update.
Unfortunately the documentation is light on the subject : https://cloud.google.com/appengine/docs/standard/java/datastore/callbacks
I don't know how i could use #PostPut to trigger an event as soon as a line is created or updated.
Does anyone have a tutorial which a basic example ?
thank you

Dan MacGrath provided an answer to a similar request (callbacks are indeed discussed below. Such solution doesn't exist yet. As a workaround, taking into account the current available triggers:
HTTP—invoke functions directly via HTTP requests.
Cloud Storage
Cloud Pub/Sub
Firebase (DB, Storage, Analytics, Auth)
Stackdriver Logging—forward log entries to a Pub/Sub topic by creating a sink. You can then trigger the function.
I would suggest a couple of solutions:
Saving something in a specific bucket from Cloud Storage every time that a line is created or updated to trigger a linked Cloud Function. You can delete the bucket contents afterwards.
Create logs with the same name and then forward them to Pub/Sub, by creating a sink.
EDIT 1
Cloud Storage triggers for Cloud Functions: Official Google doc and tutorial with a sample code in node.js 6 in Github.
Cloud Pub/Sub triggers for Cloud Functions: Official Google doc and tutorial with a sample code in node.js 6 in Github (the same than before).

Cloud Datastore does not support real-time triggers on CRUD (Create, Read, Update, Delete) events.
However, you can migrate to Cloud Firestore which does support real-time triggers for those actions (by way of Cloud Pub/Sub which can be made to invoke a Cloud Function). Cloud Firestore is the successor to Cloud Datastore and may eventually supplant it at some point in future.

Related

Google Cloud Storage upload triggers python app alternatives to Cloud Function

I'd like to find out what's the best architecture for a python app that gets triggered when a file is uploaded to Google Cloud Storage, does some processing and outputs a file to Google Drive?
I've tried using Cloud Functions but I'm getting a Function invocation was interrupted. Error: memory limit exceeded. in the logs.
I've also followed this tutorial Trigger Cloud Run with events from Eventarc so I know that one way is with EventArc and Cloud Audit logs.
2 questions:
What other methods are there since I require higher memory limits?
How do I get the bucket name and file name from cloud audit logs? through protoPayload.resourceName?
You can use PubSub. You can create a PubSub notification and create a push subscription to the service that you want.
Http cloud function
App Engine
Cloud Run
Any HTTP service running somewhere (VM, Kubernetes, on prem,...)
EventArc is mainly a wrapper of this process and can call only Cloud Run service (for now)

Access to the Google Cloud Storage Trigger Events "Pub/Sub"?

I have a Google Cloud Storage Trigger set up on a Cloud Function with max instances of 5, to fire on the google.storage.object.finalize event of a Cloud Storage Bucket. The docs state that these events are "based on" the Cloud Pub/Sub.
Does anyone know:
Is there any way to see configuration of the topic or subscription in the console, or through the CLI?
Is there any way to get the queue depth (or equivalent?)
Is there any way to clear events?
No, No and No. When you plug Cloud Functions to Cloud Storage event, all the stuff are handle behind the scene by Google and you see nothing and you can't interact with anything.
However, you can change the notification mechanism. Instead of plugin directly your Cloud Functions on Cloud Storage Event, plug a PubSub on your Cloud Storage event.
From there, you have access to YOUR pubsub. Monitor the queue, purge it, create the subscription that you want,...
The recomended way to work with storage notifications is using Pubsub.
Legacy storage notifications still work, but with pubsub you can "peek" into the pubsub message queue and clear it if you need it.
Also, you can process pubsub events with cloud run - which is easier to develop and test (just web service), easier to deploy (just a container) and it can process several requests in parallel without having to pay more (great when you have a lot of requests together).
Where does pubsub storage notifications go?
You can see where gcloud notifications go with the gsutil command:
% gsutil notification list gs://__bucket_name__
projects/_/buckets/__bucket_name__/notificationConfigs/1
Cloud Pub/Sub topic: projects/__project_name__/topics/__topic_name__
Filters:
Event Types: OBJECT_FINALIZE
Is there any way to get the queue depth (or equivalent?)
In pubsub you can have many subsciptions to topics.
If there is no subsciption, messages get lost.
To send data to a cloud function or cloud run you setup a push subscription.
In my experience, you won't be able to see what happened because it faster that you can click: you'll find this empty 99.9999% of the time.
You can check the "queue" depht in the console (pubsub -> choose you topics -> choose the subscription).
If you need to troubleshoot this, set up a second subscription with a time to live low enough that it does not use a lot of space (you'll be billed for it).
Is there any way to clear events?
You can empty the messages from the pubsub subscription, but...
... if you're using a push notification agains a cloud function it will much faster than you can "click".
If you need it, it is on the web console (opent the pubsub subscription and click in the vertical "..." on the top right).

Amazon Systems Manager alternative on GCP

Is there a solution/service available on GCP in similar lines of Systems Manager?
My end goal is to run a shell script on GCP VM on specific events.
Like for AWS, via EventBridge I was able to trigger a Lambda Function and the function in turn triggered a SSM command for specific VM.
Is this possible on GCP?
There isn't a Systems Manager equivalent in GCP.
A Pub/Sub subscription from the VMs/compute units which triggers a lambda function (cloud function in GCP) is a suboptimal solution and different from what Systems Manager accomplishes..
I don't know what kind of events you have in mind that would trigger running a script but you can check out the tutorial how to run a function using pub/sub. It shows how to use scheduler based events but it's possible to use not-scheduled triggers;
Events are things that happen within your cloud environment that you might want to take action on. These might be changes to data in a database, files added to a storage system, or a new virtual machine instance being created. Currently, Cloud Functions supports events from the following providers:
HTTP
Cloud Storage
Cloud Pub/Sub
Cloud Firestore
Firebase (Realtime Database, Storage, Analytics, Auth)
Stackdriver Logging—forward log entries to a Pub/Sub topic by creating a sink. You can then trigger the function.
And here you can read on how to implement those triggers.
For example this documentation explains how to use storage based triggers in pub/sub.
If you provide more details of what exactly you want to achieve (what events have to trigger what) then I can point you to a more direct solution.
The approach depends on the exact use case you have in hand. One of the common architecture option could be using pub/sub with cloud functions. Based on messages published to Pub/Sub topics, cloud functions performing operations of our interest can be triggered/ invoked in the same cloud project as the function.

gcp - Trigger Cloud Function on database insert?

Not sure how to search this; I'm looking for a way to trigger a Cloud Function whenever a new row is inserted into a database in Cloud SQL. The search for "google cloud function events" (or "triggers") turn up Firebase results, which is not what I want.
There are a series of Cloud Functions that receive data and transform it according to the clients' needs; in the end, after some manipulation, that data ends up in a table. Is there an event I can listen to so I can access the newly inserted rows? If not, I might end up using the Cloud Scheduler and peek regularly into the DB. However, this solution doesn't seem viable for long-term.
I'd appreciate any advice.
Currently there is no official Cloud Function event which could be triggered on changes to a Cloud SQL database. You can check the available events in the Events and Triggers documentation.
You could still do something like it with Cloud Pub/Sub, and it could be done in 2 ways:
1 - The first would be to enable and export logs from the Cloud SQL instance to a Pub/Sub topic by creating a sink on Stackdriver, and have the Cloud Function listen to that topic.
Although this method does not require you to change the way you are inserting data to the DB, it might expose too much information, as all queries will be logged on Stackdriver. It also means you would not have full control of what information is passed to the function, as the message would be the contents of the log entry.
2 - The ideal solution would be to create the Pub/Sub topic and publish to it when you insert new data to the database. This way you have more control over the information sent to the topic. You can find more information about how to set up a new topic in the Cloud Pub/Sub documentation.

Rate limited API requests in Cloud Composer

I'm planning a project whereby I'd be hitting the (rate-limited) Reddit API and storing data in GCS and BigQuery. Initially, Cloud Functions would be the choice, but I'd have to create a Datastore implementation to manage the "pseudo" queue of requests and GAE for cron jobs.
Doing everything in Dataflow wouldn't make sense because it's not advised the make external requests (i.e. hitting the Reddit API) and perpetually running a single job.
Could I use Cloud Composer to read fields from a Google Sheet, then create a queue of requests based on the Google Sheet, then have a task queue execute those requests, store them in GCS and load into BigQuery?
Sounds like a legitimate use case for Composer, additionally you could also leverage the pool concept in Airflow to manage concurrent calls to the same endpoint (e.g., Reddit API).