I have a database on a Google Cloud SQL instance. I want to connect the database to pgBadger which is used to analyse the query. I have tried finding various methods, but they are asking for the log file location.
I believe there are 2 major limitations preventing an easy set up that would allow you to use pgBadger with logs generated by a Cloud SQL instance.
The first is the fact that Cloud SQL logs are processed by Stackdriver, and can only be accessed through it. It is actually possible to export logs from Stackdriver, however the outcome format and destination will still not meet the requirements for using pgBadger, which leads to the second major limitation.
Cloud SQL does not allow changes in all required configuration directives. The major one is the log_line_prefix, which currently does not follow the required format and it is not possible to change it. You can actually see what flags are supported in Cloud SQL in the Supported flags documentation.
In order to use pgBadger you would need to reformat the log entries, while exporting them to a location where pgBadger could do its job. Stackdriver can stream the logs through Pub/Sub, so you could develop an app to process and store them in the format you need.
I hope this helps.
Related
I have a bigquery stored procedures which will run on some GCS object and do magic out of it. The procedures work perfect manually but I want to call the procedure from Nifi. I have worked with HANA and know that I need JDBC driver to connect and perform query.
Either I can use the executeprocess processor or I could use executeSQL processor. I dont know to be honest
I am not sure how to achieve that in Nifi with bigquery stored procedures. Could anyone help me on this?
Thanks in advance!!
Updated with new error if someone could help
Option1: Executeprocess
The closest thing to "execute manually" is installing the Google Cloud SDK and execute within 'executeprocess' this:
bq query 'CALL STORED_PROCEDURE(ARGS)'
or
bq query 'SELECT STORED_PROCEDURE(ARGS)'
Option 2: ExecuteSQL
If you want to use ExecuteSQL with Nifi to call the stored procedure, you'll the BigQuery JDBC Driver.
Both 'select' and 'call' methods will work with BigQuery.
Which option is better?
I believe ExecuteSQL is easier than Executeprocess.
Why? because you need to install the GCloud SDK on all systems that might run executecommand, and you must pass the google cloud credentials to them.
That means sharing the job is not easy.
Plus, this might involve administrator rights in all the machines.
In the ExecuteSQL case you'll need to:
1 - Copy the jdbc driver to the lib directory inside your Nifi installation
2 - Connect to BigQuery using pre-generated access/refresh tokens - see JDBC Driver for Google BigQuery Install and Configuration guide - that's Oauth type 2.
The good part is that when you export the flow, the credentials are embedded on it: no need to mess with credentials.json files etc (this could be also bad from a security standpoint).
Distributing jdbc jars is easier than installing the GCloud SDK: just drop a file on the lib folder. If you need it in more than one node, you can scp/sftp it, or distribute it with Ambari.
Since there is a nightly backup of SQL we are wondering of a good way to restore this backup to a different database in the same MySQL server instance. We have prod_xxxx for all our production databases AND we have staging_xxxx for all our staging databases (yes not that good in that they are all on the same mysql instance right now).
Anyways, we would love to restore all tables/constraints/etc and data from prod_incomingdb to staging_incomingdb. Is this possible in cloud SQL?
Since this is over a productive instance I recommend you to perform a backup before start, in order to avoid any data corruption.
To clone a database within the same instance, there is not a direct way to perform the task (this is a missing feature on MySQL).
I followed this path in order to successfully clone a database within same MySQL Cloud SQL instance.
1.- Create a dump of the desired database using the Google Cloud Console (Web UI) by follow these steps
*it is very important to only dump the desired database in format SQL, please not select multiple databases on the dump.
After finish the process, the dump will be available in a Google Cloud Storage Bucket.
2.- Download the dump file to a Compute Engine VM or to any local machine with linux.
3.- please replace the database name (the old one) in the USE clauses.
I used this sed command over my downloaded dump to change the names of the databases
sed -i 's/USE `employees`;/USE `emp2`;/g' employees.sql
*this can take some seconds depending the size of your file.
4.- Upload the updated file to the Cloud storage bucket.
5.- Create a new empty database on your Cloud SQL instance, in this case my target instance is called emp2.
6.- Import the modified dump by following these steps
I could not figure out the nightly backups as it seems to restore an entire instance. I think the answer to the above is no. I did find out that I can export and then import (not exactly what I wanted though as I didn't want to be exporting our DB during the day but for now, we may go with that and automate a nightly export later).
I read many articles and solutions regarding scheduling queries to external storage places in Google Big Query but they didn't seem to be that clear.
Note: My company has subscription only to Google Big Query and not to the complete cloud Services (Google Cloud Platform).
I know how to do it manually but I am looking to automate the process since I need the same data every week.
Any suggestions will be appreciated. Thank you.
Option 1
You can use Apache Airflow which provides the option to create schedule task on to of BigQuery using BigQuery operator.
You can find in this link the basic steps required to start setting this up
option 2
You can use the Google BigQuery command line to export your data as you do from the webUI, for example:
bq --location=[LOCATION] extract --destination_format [FORMAT] --compression [COMPRESSION_TYPE] --field_delimiter [DELIMITER] --print_header [BOOLEAN] [PROJECT_ID]:[DATASET].[TABLE] gs://[BUCKET]/[FILENAME]
Once you get this working you can use any schedule process of your liking to schedule the run of this job
BTW: Airflow has a connector which enables you to run the command line tool
Once the file in GCP you can use Box G suite integration to see and manage your files
We have PostgreSQL instances (1 master + 1 read replica) on Google SQL. Our Django (1.11.12) application uses these databases via PostGIS engine. When we try to use the database, we saw this error message:
django.db.utils.OperationalError: canceling statement due to conflict with recovery
DETAIL: User query might have needed to see row versions that must be removed.
When I search for a solution, they generally say that I need to change hot_standby_feedback flag. But as you know Google SQL service has some restrictions about settings. I can't set the flag.
How can I fix this?
If “Google SQL” allows that, you can set max_standby_streaming_delay to -1 so that replication is delayed if a conflict is detected.
Then the query will not be canceled, but replication may lag if applying changes would cause a conflict.
Consider getting an “unfettered” PostgreSQL.
If you would like set hot_standby_feedback = on, I'll suggest that you indicate your interest in the open feature request on Google Cloud Platform's Public Issue Tracker tool. That way someone can look into the handling query conflict issues your Cloud SQL PostgreSQL instance encountered.
I've also been monitoring an open thread in the Issue Tracker about making max_standby_archive_delay and max_standby_streaming_delay flags available to users to set. You can track it there as well. Hope this helps!
While Spanner looks exciting, the documentation for the Simba JDBC driver (included in the download links here: https://cloud.google.com/spanner/docs/partners/drivers) are relatively sparse, especially when compared to the documentation for the Simba JDBC BigQuery driver (https://cloud.google.com/bigquery/partners/simba-drivers/).
In particular, the documentation only mentions one connection string:
jdbc:cloudspanner://localhost;Project=simba-cloudspanner- jdbc;Instance=test-instance;Database=example-db
... there is no information about how to specify, for example, a service account and its p12 credentials or a path to a JSON file, which many Google Cloud services use.
Can anyone share JDBC connection strings or other setup details they have successfully used to connect to the service? I have tried, for example, setting the environment variable GOOGLE_APPLICATION_CREDENTIALS and providing a JDBC string in the same style as above, but to no avail.
Ideally, I would like to use a combination of instance id, project name, database name, a service account email, and a p12 file, but am open to other authentication options.
EDIT: When attempting the GOOGLE_APPLICATION_CREDENTIALS strategy, I generated this log file, in case it might be of any help https://gist.github.com/aryeh-looker/e6b1b1617d301f0a247463216c96535d
Double-checked my work, and it looks like I am in fact able to connect with a connection string as above and by setting the environment variable GOOGLE_APPLICATION_CREDENTIALS. Would be ideal to have some other options and documentation is still a bit spotty (no mention of the environment variable), so more information could be ideal.
This is a semi-workable solution. It suffers from the fact that you cannot have multiple connections with different service accounts in the same process.
EDIT 2: This does not seem to work. I get errors about the instance not being specified when pointing to a JSON file.
EDIT: looks like with the latest release of the Spanner driver, there is a way to do this.
The latest release of the driver (1.0.4.1005) appears to support an optional JDBC parameter PvtKeyPath which takes a path to your private key as opposed to having to set the GOOGLE_APPLICATION_CREDENTIALS variable. Worth a look.
From the included PDF documentation:
So you will have a URL like: jdbc:cloudspanner://;Project=...;PvtKeyPath=/path/to/credentials.json
As the JDBC Driver supplied by Google is severely limited (does not support DML and DDL statemetns), I have written my own JDBC Driver. The driver is designed to work with JPA/Hibernate-enabled applications. The driver can be found here: https://github.com/olavloite/spanner-jdbc
This driver supports the same kind of URL's as the driver supplied by Google, including the PvtKeyPath property. It is still BETA, but I already use it for one of my own applications.