I need to log all the queries on my sql instances, filter those that seem to fetch too many rows and archive it for a period of time.
Doc says
(2) All SQL queries executed on the database instance
about "Data Access audit logs".
So I've set out to enable access audit logs for my cloud sql server.
I have checked data read in the audit logs api page as said in here, but I cannot seem to find it anywhere in the log explorer. I have aleady checked log names for "projects/PROJECT_NAME/logs/cloudaudit.googleapis.com%2Fdata_access" to no use.
What am I missing? what am I doing wrong?
To view the logs in Cloud Logging you need to activate the flag general_log in the MySQL flags feature page.
Related
I am trying to migrate the database using AWS DMS. Source is Azure SQL server and destination is Redshift. Is there any way to know the rows updated or inserted? We dont have any audit columns in source database.
Redshift doesn’t track changes and you would need to have audit columns to do this at the user level. You may be able to deduce this from Redshift query history and save data input files but this will be solution dependent. Query history can be achieved in a couple of ways but both require some action. The first is to review the query logs but these are only saved for a few days. If you need to look back further than this you need a process to save these tables so the information isn’t lost. The other is to turn on Redshift logging to S3 but this would need to be turned on before you run queries on Redshift. There may be some logging from DMS that could be helpful but I think the bottom line answer is that row level change tracking is not something that is on in Redshift by default.
I am trying to create a trigger for a Cloud Function to copy events_intraday table data as soon as new data has been exported.
So far I have been following this answer to generate a sink from Cloud Logging to Pub/Sub.
I have only been able to find logs for events_YYYMMDD tables but none for events_intraday_YYYYMMDD neither on Cloud Logging nor on BigQuery Job History (Here are my queries for events tables and events_intraday tables on Cloud Logging).
Am I looking at the wrong place? How is it possible for the table to be updated without any logs being generated?
Update: There is one(1) log generated per day when the table is created but "table update" logs are yet to be found.
Try
protoPayload.authorizationInfo.permission="bigquery.tables.create"
protoPayload.methodName="google.cloud.bigquery.v2.TableService.InsertTable"
protoPayload.resourceName : "projects/'your_project'/datasets/'your_dataset'/tables/events_intraday_"
I am not using my Cloud SQL since last 7 days and there is no query I am firing. Then why this graph shows I am making 10 queries per second. What does this graph signify ?
That graph signifies the number of statements executed by the server. This variable includes statements executed within stored programs.
To have a wider vision of what queries you Cloud SQL server is executing you can:
1. Enable Cloud SQL flag general_log=ON and log_output=FILE[1]
2. Go to [2], select your Cloud SQL instance and All logs=cloudsql.googleapis.com/mysql-general.log
You will see in the stackdriver logs the queries your server is executing behind.
You can further look up these queries in the mysql reference manual.
If you have Cloud Audit logs enabled then the latter could be the reason why you are seeing these results in the graph. To disable Audits logs, you need to go to[3] and uncheck the Cloud SQL row.
Important:
1. When you set, remove, or modify a flag for a database instance, the database might be restarted. The flag value is then persisted for the instance until you remove it. If the instance is the source of a replica, the replica will also restart to align with the current configuration of the instance[1].
After looking into the cloudsql.googleapis.com/mysql-general.log, you may want to disable general_log and output_file to stop further charges due to the log size. I would recommend you also visit the tips for general log flags[4].
[1] https://cloud.google.com/sql/docs/mysql/flags#config
[2] https://console.cloud.google.com/logs/viewer
[3] https://console.cloud.google.com/iam-admin/audit
[4] https://cloud.google.com/sql/docs/mysql/flags#tips
For context, we would like to visualize our data in google data studio - this dataset receives more entries each week. I have tried hosting our data sets in google drive, but it seems that they're too large and this slows down google data studio (the file is only 50 mb, am I doing something wrong?).
I have loaded our data into google cloud storage --> google bigquery, and connected my google data studio to my bigquery table. This has allowed me to use the google data studio dashboard much quicker!
I'm not sure what is the best way to update our data weekly in google cloud/bigquery. I have found a slow way to do this by uploading the new weekly data to google cloud, then appending the data to my table manually in bigquery, but I'm wondering if there's a better way to do this (or at least a more automated way)?
I'm open to any suggestions, and if you think that bigquery/google cloud storage is not the answer for me, please let me know!
If I understand your question correctly, you want to automate the query that populate your table, which is connected to Data Studio.
If this is the case, then you can use Scheduled Query from BigQuery. Scheduled query allow you to define a query which results can be inserted in a new table. Particularly you can specify different rules for repetition (minimum each 15 minutes) and execution, as well as destination writing options (destination table, writing mode: append, truncate).
In order to use Scheduled Queries your account must have the right permissions. You can have a look at the following documentation to better understand how to use Scheduled Query [1].
Also, please note that at the front end the updated data in the BigQuery table will be seen updated in Datastudio at each refresh (click on refresh button in Datastudio). To automatically refresh the front-end visualization you can use the following plugin [2] or automate the click on the refresh button through Browser console commands.
[1] https://cloud.google.com/bigquery/docs/scheduling-queries
[2] https://chrome.google.com/webstore/detail/data-studio-auto-refresh/inkgahcdacjcejipadnndepfllmbgoag?hl=en
We are using PowerBI online. we purchased pro license.
I checked for my login and I am in Global Administrator role.
Still I found Refresh schedule on Dataset option disabled for me?
It is disabled, because your dataset is not refreshable. You should check What's supported? section of Configuring scheduled refresh article. You didn't mention what is the source of your dataset. In general, Power BI service needs access to the data source to be able to refresh the dataset. This means that either your data source should be somewhere in the cloud (e.g. Azure SQL Database, file on OneDrive and SharePoint Online, etc.), or there is a data gateway installed and configured.