I am trying to create a trigger for a Cloud Function to copy events_intraday table data as soon as new data has been exported.
So far I have been following this answer to generate a sink from Cloud Logging to Pub/Sub.
I have only been able to find logs for events_YYYMMDD tables but none for events_intraday_YYYYMMDD neither on Cloud Logging nor on BigQuery Job History (Here are my queries for events tables and events_intraday tables on Cloud Logging).
Am I looking at the wrong place? How is it possible for the table to be updated without any logs being generated?
Update: There is one(1) log generated per day when the table is created but "table update" logs are yet to be found.
Try
protoPayload.authorizationInfo.permission="bigquery.tables.create"
protoPayload.methodName="google.cloud.bigquery.v2.TableService.InsertTable"
protoPayload.resourceName : "projects/'your_project'/datasets/'your_dataset'/tables/events_intraday_"
Related
I have been working with AWS Athena for a while and need to do create a backup and version control of the views. I'm trying to build an automation for the backup to run daily and get all the views.
I tried to find a way to copy all the views created in Athena using boto3, but I couldn't find a way to do that. With Dbeaver I can see and export the views SQL script but from what I've seen only one at a time which not serve the goal.
I'm open for any way.
I try to find answer to my question in boto3 documentation and Dbeaver documentation. read thread on stack over flow and some google search did not took me so far.
Views and Tables are stored in the AWS Glue Data Catalog.
You can Query the AWS Glue Data Catalog - Amazon Athena to obtain information about tables, partitions, columns, etc.
However, if you want to obtain the DDL that was used to create the views, you will probably need to use SHOW CREATE TABLE [db_name.]table_name:
Analyzes an existing table named table_name to generate the query that created it.
Have you tried using get_query_results in boto3? get_query_results
I am not sure where this lies in the Google teams. But we would like to see the org_unit_path added to the usage table schema and populated by the Audit Log Export from Google Workspace to BigQuery. It already exists in the activity table but not in the usage table. We could join the two tables to get the information but that should not be necessary and also potentially misses accounts that are not active but still need to be reported on.
I need to log all the queries on my sql instances, filter those that seem to fetch too many rows and archive it for a period of time.
Doc says
(2) All SQL queries executed on the database instance
about "Data Access audit logs".
So I've set out to enable access audit logs for my cloud sql server.
I have checked data read in the audit logs api page as said in here, but I cannot seem to find it anywhere in the log explorer. I have aleady checked log names for "projects/PROJECT_NAME/logs/cloudaudit.googleapis.com%2Fdata_access" to no use.
What am I missing? what am I doing wrong?
To view the logs in Cloud Logging you need to activate the flag general_log in the MySQL flags feature page.
When running
gcloud logging sinks list
it seems I have several sinks for my project
▶ gcloud logging sinks list
NAME DESTINATION FILTER
myapp1 bigquery.googleapis.com/projects/myproject/datasets/myapp1 resource.type="k8s_container" resource.labels.cluster_name="mygkecluster" resource.labels.container_name="myapp1"
myapp2 bigquery.googleapis.com/projects/myproject/datasets/myapp2 resource.type="k8s_container" resource.labels.cluster_name="mygkecluster" resource.labels.container_name="myapp2"
myapp3 bigquery.googleapis.com/projects/myproject/datasets/myapp3 resource.type="k8s_container" resource.labels.cluster_name="mygkecluster" resource.labels.container_name="myapp3"
However, when I navigate in my BigQuery console, I don't see the corresponding datasets.
Is there a way to import these sinks as datasets so that I can run queries against them?
This guide on creating BigQuery datasets does not list how to do so from a log sink (unless I am missing something)
Also any idea why the above datasets are not displayed when using the bq ls command?
Firstly, be sure to be in the good project. if not, you can import dataset from external project by clicking on the PIN button (and you need to have enough permission for this).
Secondly, the Cloud Logging sink to BigQuery doesn't create the dataset, only the tables. So, if you have created the sinks without the dataset, you sinks aren't running (or run in error). Here more details
BigQuery: Select or create the particular dataset to receive the exported logs. You also have the option to use partitioned tables.
In general, what you expect for this feature to do is right, using BigQuery as log sink is to allow you to query the logs with BQ. For the problem you're facing, I believe it is to do with using Web console vs. gcloud.
When using BigQuery as log sink, there are 2 ways to specify a dataset:
point to an existing dataset
create a new dataset
When creating a new sink via web console, there's an option to have Cloud Logging create a new dataset for you as well. However, when using gcloud logging sinks create, it does not automatically create a dataset for you, only create the log sink. It seems like it also does not validate whether the specified dataset exists.
To resolve this, you could either use web console for the task or create the datasets on your own. There's nothing special about creating a BQ dataset to be a log sink destination comparing to creating a BQ dataset for other purpose. Create a BQ dataset, then create a log sink to point to the dataset and you're good to go.
Conceptually, different products (BigQuery, Cloud Logging) on GCP runs independently, the log sink in Cloud Logging is simply an object that pairs up filter and destination, but does not own/manage the destination resource (eg. BQ dataset). It's just that in web console, it provide some extra integration to make things easier.
I have a BigQuery table and an external data import process that should add entries every day. I need to verify that the table contains current data (with a timestamp of today). Writing the SQL-query is not a problem.
My question is how to best install such a monitoring in GCP? Can Stackdriver execute custom BigQuery SQL? Or would a CloudFunction be more suitable? An AppEngine application with a cronjob? What's the best practise?
Not sure what's the best practice here, but one simple solution is to use BigQuery scheduled query. Schedule query, make it fail is something is wrong using ERROR() function, configure scheduled query to notify (it sends email) if it fails.