Can Google add org_unit_path to the usage table schema of the Export Audit Logs to BigQuery? - google-cloud-platform

I am not sure where this lies in the Google teams. But we would like to see the org_unit_path added to the usage table schema and populated by the Audit Log Export from Google Workspace to BigQuery. It already exists in the activity table but not in the usage table. We could join the two tables to get the information but that should not be necessary and also potentially misses accounts that are not active but still need to be reported on.

Related

Row level changes captured via AWS DMS

I am trying to migrate the database using AWS DMS. Source is Azure SQL server and destination is Redshift. Is there any way to know the rows updated or inserted? We dont have any audit columns in source database.
Redshift doesn’t track changes and you would need to have audit columns to do this at the user level. You may be able to deduce this from Redshift query history and save data input files but this will be solution dependent. Query history can be achieved in a couple of ways but both require some action. The first is to review the query logs but these are only saved for a few days. If you need to look back further than this you need a process to save these tables so the information isn’t lost. The other is to turn on Redshift logging to S3 but this would need to be turned on before you run queries on Redshift. There may be some logging from DMS that could be helpful but I think the bottom line answer is that row level change tracking is not something that is on in Redshift by default.

Can't find BigQuery Logs for GA4 events_intraday tables

I am trying to create a trigger for a Cloud Function to copy events_intraday table data as soon as new data has been exported.
So far I have been following this answer to generate a sink from Cloud Logging to Pub/Sub.
I have only been able to find logs for events_YYYMMDD tables but none for events_intraday_YYYYMMDD neither on Cloud Logging nor on BigQuery Job History (Here are my queries for events tables and events_intraday tables on Cloud Logging).
Am I looking at the wrong place? How is it possible for the table to be updated without any logs being generated?
Update: There is one(1) log generated per day when the table is created but "table update" logs are yet to be found.
Try
protoPayload.authorizationInfo.permission="bigquery.tables.create"
protoPayload.methodName="google.cloud.bigquery.v2.TableService.InsertTable"
protoPayload.resourceName : "projects/'your_project'/datasets/'your_dataset'/tables/events_intraday_"

Dataset shows only 5 event tables after re-linking Firebase with another Google Analytics account

Recently unlinked and re-linked a Firebase project with a different Google Analytics account.
The BigQuery integration configured to export GA data created the new dataset and data started populating into that.
The old dataset corresponding to the unlinked, "default" GA account, which contained ~2 years of data is still accessible in the BigQuery UI, however only the 5 most recent event_ tables are visible in the dataset. (5 days worth of event data)
Is it possible to extract historical data from the old, unlinked dataset?
What I could suggest, it's to do some queries for further validate the data that you have within your BigQuery dataset.
In this case, I would start by getting the dates for each table to see the amount (days) of data contained on the dataset.
SELECT event_date
FROM `firebase-public-project.analytics_153293282.events_*`
GROUP BY event_date ORDER BY event_date
EDIT
A better way to do this, and get all the tables within the dataset, is using the bq command line tool, see reference here.
bq ls firebase-public-project:analytics_153293282
You'll get something like this:
You could also do a COUNT(event_date), so you can see how many records you have per day, and compare this to the content that you have or you can see on your Firebase project.
SELECT event_date, COUNT(event_date) ...
On the case that there's data missing, you could use table decorators, to try to recover that data, see example here.
About the table's expiration date you can see this, in short, expiration time can be set by default at dataset level and it would be applied for new tables (existing tables require a manual update of their expiration time one by one), and expiration time can be set during the creation of the table. To see if there was any change on the expiration time you could look into your logs for protoPayload.methodName="tableservice.update", and see if there was set an expireTime as follows:
tableUpdateRequest: {
resource: {
expireTime: "2020-12-31T00:00:00Z"
...
}
}
Besides this, if you have a GCP support plan, you could reach them looking for further assistance on what could have happened with your tables on that dataset. Otherwise, you could open an issue tracker. Keep in mind that Firebase doesn't delete your data when unlinking a Firebase project from BigQuery, so in theory the data should be there.

GCP: How to see Stats of Insert, Update and Delete statements in Console or from backend in Spanner

In spanner Console we have QUERY Stats which shows stats of Select queries in Spanner from below tables. But we don't see Stats of Insert, Update and Delete statements in Spanner Console. Neither is that available in below tables
Tables: QUERY_STATS_TOTAL_MINUTE, QUERY_STATS_TOTAL_10MINUTE, QUERY_STATS_TOP_MINUTE, QUERY_STATS_TOP_10MINUTE, QUERY_STATS_TOP_HOUR,QUERY_STATS_TOTAL_HOUR
I tried searching for any related tables but couldn't find them
Can someone please share the list of Spanner tables where we can find Stats of I, U, D statements.
Also, I think it should also be provided by GCP on console for quick analysis.
Indeed those tables and views for DMLs are very useful for troubleshooting but DML Stats (and its backing tables) equivalent to Query Stats are not there yet. I have added this feature request to our backlog.
Unfortunately, Cloud Spanner’s query statistics does not include DML queries (insert, update and delete) and will not have any tables related to those statements.
It is mentioned in these Google Cloud Viewing query statistics in the console and Query statistics tables articles as a note at the top of both pages.

How to retrive Bigquery billing details from GCP console or UI?

My team is using bigquery for our product development. Other bill of Rs 5159 got generated for one days transaction.
I checked the transaction details and
BigQuery Analysis: 15.912 Tebibytes [Currency conversion: USD to INR using rate 69.155]
Is is possible to somehow find out more details about the transactions like table name, queries that were executed and exact time of execution?
BigQuery automatically sends audit logs to Stackdriver Logging and provide the ability to do aggregated analysis on logs data. You can see BigQuery schema for exported logs for details
As quick example: Query cost breakdown by identity
This query shows estimated query costs by user identity. It estimates costs based on the list price for on-demand queries in the US. This pricing may not be accurate for other locations or for customers leveraging flat-rate billing.
#standardSQL
WITH data as
(
SELECT
protopayload_auditlog.authenticationInfo.principalEmail as principalEmail,
protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent AS jobCompletedEvent
FROM
`MYPROJECTID.MYDATASETID.cloudaudit_googleapis_com_data_access_YYYYMMDD`
)
SELECT
principalEmail,
FORMAT('%9.2f',5.0 * (SUM(jobCompletedEvent.job.jobStatistics.totalBilledBytes)/POWER(2, 40))) AS Estimated_USD_Cost
FROM
data
WHERE
jobCompletedEvent.eventName = 'query_job_completed'
GROUP BY principalEmail
ORDER BY Estimated_USD_Cost DESC
As of last year BigQuery provides INFORMATION_SCHEMA tables that also give access to job information via JOBS_BY_* views. The INFORMATION_SCHEMA.JOBS_BY_USER and INFORMATION_SCHEMA.JOBS_BY_PROJECT views even include the exact query alongside the processed bytes. It might not be 100% accuracte (because bytes processed != bytes billed) but it should allow you to gain a good overview over your costs, which queries triggered them an who the initiator was.
Example
SELECT
creation_time,
job_id,
project_id,
user_email,
total_bytes_processed,
query
FROM
`region-us`.INFORMATION_SCHEMA.JOBS_BY_USER
The most "efficient" way to keep an eye on the cost is using the INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION view as it automatically includes all projects of the organization. You need to be Organization Owner or Organization Administrator to use that view, though.
From there you can figure out which jobs were the most expensive (= get their job id) and form there drill down via JOBS_BY_PROJECT to get the exact query.
See https://www.pascallandau.com/bigquery-snippets/monitor-query-costs/ for a more comprehensive explanation.
You need to Export Billing Data to BigQuery
Tools for monitoring, analyzing and optimizing cost have become an important part of managing development. Billing export to BigQuery enables you to export your daily usage and cost estimates automatically throughout the day to
export data to a CSV,JSON file
However, if you use regular file export, you should be aware that regular file export captures a smaller dataset than export to BigQuery. For more information about regular file export and the data it captures, see Export Billing Data to a File.
to a BigQuery dataset you specify.
After you enable BigQuery export, it might take a few hours to start seeing your data. Billing data automatically exports your data to BigQuery in regular intervals, but the frequency of updates in BigQuery varies depending on the services you're using. Note that BigQuery loads are ACID compliant, so if you query the BigQuery billing export dataset while data is being loaded into it, you will not encounter partially loaded data.
Follow the step by step guide: How to enable billing export to BigQuery
https://cloud.google.com/billing/docs/how-to/export-data-bigquery