google bigquery personalized alerts based on a table column - google-cloud-platform

I have a table in google bigquery in which I calculate a column (think of it as anomaly detection column),
is there a way, within GCP, to send a rule based alert (e.g. once the value in the column is 1)
if not how would you recommend dealing with this issue.
Thanks

A solution could be to use eventarc triggers: when data is inserted into your table, the job is written into cloud audit logs and you can trigger a Cloud Run like this.
With this Cloud Run it's possible to inspect the column you mention and send notifications accordingly.
Here is a good reference on how to proceed.

Because you run a request to calculate the line and the alert to send, the solution is to get the result of this request and to use it.
You can export data to a file (csv for example) and then to trigger a cloud functions on the file creation on Cloud Storage. The cloud functions will read the file and for each line trigger an alert, or send only one alert with the summary of the file, or to send the file in attachement.
You can also fetch all the line of the request result and publish a PubSub message for each line. Like that, you can process in parallel all the messages with Cloud Function or Cloud Run (this time it's not possible to have only 1 alert with a summary of all the lines, the message are unitary)

Related

Can't find BigQuery Logs for GA4 events_intraday tables

I am trying to create a trigger for a Cloud Function to copy events_intraday table data as soon as new data has been exported.
So far I have been following this answer to generate a sink from Cloud Logging to Pub/Sub.
I have only been able to find logs for events_YYYMMDD tables but none for events_intraday_YYYYMMDD neither on Cloud Logging nor on BigQuery Job History (Here are my queries for events tables and events_intraday tables on Cloud Logging).
Am I looking at the wrong place? How is it possible for the table to be updated without any logs being generated?
Update: There is one(1) log generated per day when the table is created but "table update" logs are yet to be found.
Try
protoPayload.authorizationInfo.permission="bigquery.tables.create"
protoPayload.methodName="google.cloud.bigquery.v2.TableService.InsertTable"
protoPayload.resourceName : "projects/'your_project'/datasets/'your_dataset'/tables/events_intraday_"

Integration of BigQuery and Dialogflow

I'm getting started on Dialogflow and I would like to integrate it with BigQuery.
I have some tables in BigQuery with difference data, for instance a record of alarms that a wind turbine showed during time.
In one of my test cases, let's say I want my chatbot to tell me what alarms were raised in the wind turbine number 5 of my farm, on the 25th of October.
I have already created a chatbot in Dialogflow that asks for all the necessary parameters of the enquiry, such as the wind farm name, the wind turbine number, the date, and the name of the alarm.
My doubt now is how I can send those parameters to BigQuery in order to dig into my tables, extract the required information, and print it in Dialogflow.
I have been looking for documentation or tutorials but nothing came out that could fit my case...
Thanks in advance!
You need to implement a fulfillment. It triggers a webhook, for example a Cloud Functions or a Cloud Run service.
This webhook call contains the value gather by your intent and parameters. You have to extract them and perform your process, for example a call to BigQuery. Then format the response and display it on Dialogflow.

How to monitor if a BigQuery table contains current data and send an alert if not?

I have a BigQuery table and an external data import process that should add entries every day. I need to verify that the table contains current data (with a timestamp of today). Writing the SQL-query is not a problem.
My question is how to best install such a monitoring in GCP? Can Stackdriver execute custom BigQuery SQL? Or would a CloudFunction be more suitable? An AppEngine application with a cronjob? What's the best practise?
Not sure what's the best practice here, but one simple solution is to use BigQuery scheduled query. Schedule query, make it fail is something is wrong using ERROR() function, configure scheduled query to notify (it sends email) if it fails.

Can we schedule StackDriver Logging to export log?

I am new to Google Cloud StackDriver Logging and as per this documentation StackDriver stores the Data Access audit logs for 30 days. Also mentioned on the same page, that Size of a log entry is limited to 100KB.
I am aware of the fact that the logs can be exported Google Cloud Storage using Cloud SDK as well as using Logging Libraries in many languages (we prefer Python).
I have two questions related to the exporting the logs, which are:
Is there any way in StackDriver to schedule something similar to a task or cronjob that keeps exporting the Logs in the Google Cloud storage automatically after a fixed interval of time?
What happens to the log entries which are larger than 100KB. I assume they get truncated. Is my assumption correct? If yes, is there any way in which we can export/view the full(which is not at all truncated) Log entry?
Is there any way in StackDriver to schedule something similar to a
task or cronjob that keeps exporting the Logs in the Google Cloud
storage automatically after a fixed interval of time?
Stackdriver supports exporting log data via sinks. There is no schedule that you can set as everything is automatic. Basically, the data is exported as soon as possible and you have no control over the amount exported at each sink or the delay between exports. I have never found this to be an issue. Logging, by design, is not to be used as a real-time system. The closest is to sink to PubSub which has a couple of second delay (based upon my experience).
There are two methods to export data from Stackdriver:
Create an export sink. Supported destinations are BigQuery, Cloud Storage and PubSub. The log entries will be written to the destination automatically. You can then use tools to process the exported entries. This is the recommended method.
Write your own code in Python, Java, etc. to read the log entries and do what you want with them. Scheduling is up to you. This method is manual and requires your management of schedule and destination.
What happens to the log entries which are larger than 100KB. I assume
they get truncated. Is my assumption correct? If yes, is there any way
in which we can export/view the full(which is not at all truncated)
Log entry?
Entries that exceed the max size of an entry cannot be written to Stackdriver. The API call that attempts to create the entry will fail with an error message similar to (Python error message):
400 Log entry with size 113.7K exceeds maximum size of 110.0K
This means that entries that are too large will be discarded unless the writer has logic to handle this case.
As per the documentation of stack driver logging the whole process is automatic. Export sink to google cloud storage is slower than the Bigquery and Cloud sub/pub. link for the documentation
I recently used the export sink to the big query, which is better than cloud pub/sub in case if you don't want to use other third-party application for log analysis. For Bigquery sink needs dataset where do you want to store the log entries. I noticed that sink create bigquery table on a timestamp basis in the bigquery dataset.
One more thing if you want to query timestamp partitioned tables check this link
Legacy SQL Functions and Operators

Matillion for Amazon Redshift support for job monitoring

I am working on Amazon Matillion for Redshift and we have multiple jobs running daily triggered by SQS messages. Now I am checking the possibility of creating a UI dashboard for stakeholders which will monitor live progress of jobs and will show report of previous jobs, like Job name, tables impacted, job status/reason for failure etc. Does Matillion maintain this kind of information implicitly? Or I will have to maintain this information for each job.
Matillion has an API which you can use to obtain details of all task history. Information on the tasks API is here:
https://redshiftsupport.matillion.com/customer/en/portal/articles/2720083-loading-task-information?b_id=8915
You can use this to pull data on either currently running jobs or completed jobs down to component level including name of job, name of component, how long it took to run, whether it ran successfully or not and any applicable error message.
This information can be pulled into a Redshift table using the Matillion API profile which comes built into the product and the API Query component. You could then build your dashboard on top of this table. For further information I suggest you reach out to Matillion via their Support Center.
The API is helpful, but you can only pass a date as a parameter (this is for Matillion for Snowflake, assume it's the same for Redshift). I've requested the ability to pass a datetime so we can run the jobs throughout the day and not pull back the same set of records every time our API call runs.