Trigger an AWS lambda when a table is created in BigQuery - amazon-web-services

Our Google Analytics data events are exported to BigQuery tables. I have reports that need to run when the events data arrives which are set up as AWS lambdas with python code (for various reasons and I can't immediately move these to be Google Cloud Functions etc).
Is it possible to have the creation of a table trigger a lambda? At present, I have a lambda periodically checking to see if the table has been created which seems suboptimal. Eventarc looks like it might possibly be the way to monitor for the creation event at the BigQuery end but it doesn't seem obvious how you'd interface with AWS.
Any genius ideas? I have dug repeatedly through StackOverflow, but can't see a match for this issue

Eventarc isn't magic, it's only a wrapper of different things that you can do and customize (with a custom destination and not a Cloud Run).
Typically, Eventarc do:
Create a Cloud Logging sink on a specific log filter (filter what you want to get custom events)
Sink the filtered log entries in PubSub topic
Create a PubSub push subscription that invoke Cloud Run HTTP endpoint.
You can create piece by piece all those steps. And in the latest one, invoke your AWS Lambda instead of Cloud Run.
But the difficulty is not here. The difficulty comes from the variety of table creation possibilities:
By API call (table creation API)
By Load Job (load a file into a table create it automatically but without invoking the table creation API)
Directly in SQL with CREATE TABLE statement (but you can have also this statement in a script, you can have dynamic SQL,...)
And you might want to capture also the other creations (views, materialized views, procedure, functions,....)
At the end, your current method (invoque periodically the schema metadata info and get the recent addition in a dataset) could be the most "effortless" efficient!

Related

Automatic schedular triggered after bigquery table load

It's about to Trigger automatically schedular after bigquery table get loaded.
In Brief:
I don't want to use weekly schedule Query , that are manual task, I want this to make it automatically trigger the Schedular when table get loaded to the Bigquery table.
Currently I'm using manual weekly schedule query, I want that the query triggers automatically without any Manual things.
I am Just trying to think about logic but not getting koi.
Airflow doesn't provide an external trigger in the dag, but you can create your dag and trigger it from a cloud function once the table is loaded, this cloud function send a POST http request to Airflow API to trigger a new run with all the params your dag needs to process the data.
To trigger the cloud function, you can check this answer which explain how to use eventarc to listen your bigquery logs from cloud watch, filter them and trigger cloud function run.
To trigger the dag run, you can check the Airflow API doc and the API security doc in order to allow cloud function to send requests to the API.

Triggering a Materialized View Refresh - AWS Lambda

I'm trying to create the Architecture on AWS where a lambda function runs SQL Code to refresh a materialized view on AWS Redshift. I would like the materialized view to refresh after the daily ETL processes have completed on the Redshift cluster. Is there a way of setting up the lambda function to be triggered after a particular SQL command on the Redshift Cluster has completed?
Unfortunately, I've only seen examples of people scheduling the Lambda Function to run on particular intervals/at a particular time. Any help would be much appreciated.
A couple of ways that this can be done (out of many):
Have the ETL process trigger the Lambda - this is straight forward
if the ETL tool can generate the trigger but organizational factors
can make changing ETL frameworks difficult.
Use an S3 semaphore - have your ETL SQL UNLOAD some small data (like
a text string of metadata) to S3 where the objects creation will
trigger the Lambda. Insert the UNLOAD at the point in the ETL SQL
where you want the update to occur.

Compute Engine VM Creation Notification

I wanted to get notified if/when there is/are any VM creation in my infra on GCP.
I see a google library that can give me list of VM.
I can create a function to use this code (probably)
Schedule the above function. And check for difference.
But do storage like triggers available for Compute.
Also if there is any other solution.
You have a third solution. You can use Cloud Run instead of Cloud Functions (the migration is very easy, let me know if you have issues).
With Cloud Run, you can use the trigger (eventArc feature), a new feature (still in preview) based on the auditLog logs. It's very similar to the first solution proposed by LundinCast, but it's automatically set up by Cloud Run Trigger feature.
So, deploy your service on Cloud Run. Then configure a trigger on v1.compute.instancs.insert API, select your region or make the trigger global and that's all!! Your service will be triggered when a new instance will be created.
As you can see in my screenshot, you will be asked to activate the auditLog to be able to use this feature. Because it's built-in, it's done automatically for you!
Using Logging sink and a PubSub-triggered Cloud Function
First, export the relevant logs to a PubSub topic of your choice by creating a Logging sink. Include the logs created automatically during VM creation with the following log filter:
resource.type="gce_instance"
protoPayload.methodName="beta.compute.instances.insert"
protoPayload.methodName="compute.instances.insert"
Next, create a Cloud Function that'll trigger every time a new log is set to the PubSub topic. You can process this new message as per your needs.
Note that with this option you'll have to handle to notification yourself (for example, by sending an email). It is useful though if you want to send different notification based on some condition or if you want to perform additional actions apart from the notification.
Using a log-based metric and a Cloud Monitoring alert
You can use a Log-based metric filtering logs for Compute Engine VM creation and set an alert on that metric to get notified.
First create a counter log-based metric with a log filter similar to the one in the previous method, which will report a data point to Cloud monitoring every time a new VM instance is created.
Then go to Cloud Monitoring and create an alert based on that metric that trigger every time a metric is reported.
This option is the easiest to set up and supports various notification channels out-of-the-box.
Going along with LudninCast's answer.
Cloud Run --
Would have used it if it had not been zone issue for me. Though I conclude this from POC I did
Easy setup.
Containerised Apps. Probably more code to maintain.
Public URL for app.
Out of box support for the requirements like mine.
Cloud Function --
Sink setups for triggers can be time consuming for first timer
Easy coding and maintainance.

gcp - Trigger Cloud Function on database insert?

Not sure how to search this; I'm looking for a way to trigger a Cloud Function whenever a new row is inserted into a database in Cloud SQL. The search for "google cloud function events" (or "triggers") turn up Firebase results, which is not what I want.
There are a series of Cloud Functions that receive data and transform it according to the clients' needs; in the end, after some manipulation, that data ends up in a table. Is there an event I can listen to so I can access the newly inserted rows? If not, I might end up using the Cloud Scheduler and peek regularly into the DB. However, this solution doesn't seem viable for long-term.
I'd appreciate any advice.
Currently there is no official Cloud Function event which could be triggered on changes to a Cloud SQL database. You can check the available events in the Events and Triggers documentation.
You could still do something like it with Cloud Pub/Sub, and it could be done in 2 ways:
1 - The first would be to enable and export logs from the Cloud SQL instance to a Pub/Sub topic by creating a sink on Stackdriver, and have the Cloud Function listen to that topic.
Although this method does not require you to change the way you are inserting data to the DB, it might expose too much information, as all queries will be logged on Stackdriver. It also means you would not have full control of what information is passed to the function, as the message would be the contents of the log entry.
2 - The ideal solution would be to create the Pub/Sub topic and publish to it when you insert new data to the database. This way you have more control over the information sent to the topic. You can find more information about how to set up a new topic in the Cloud Pub/Sub documentation.

Register AWS Redshift activity

As per AWS docs, there's no Redshift-Lambda integration yet.
What we would like to do is monitoring redshift activity in order to do something when a redshift table is created, a copy from S3 is made or a bulk insert is performed.
Is there a way to register this kind of activity, and then do something similar to run a lambda function ir order run a small script or so?
Redshift provides an event notification mechanism. You can find a full list of the event categories and messages here. If that covers the kind of information you are interested in you can simply have your Lambda function add the SNS topic used by Redshift for event notification as an event source and your Lambda function will get called every time an event is sent by Redshift.
You can enable audit logs that end up in s3.
All the info you want is also available in various admin tables with prefixes like stl_, stv_ and pg_. For example, COPY commands from S3 are recorded in stl_load_commits, and stl_utilitytext has info on non-select queries like CREATE.
As for triggering events, you could have S3 trigger a lambda when one of the log files lands or run occasional jobs that query the system tables and take action with something like cron jobs or airflow.