I have a Cloud SQL managed DB. Also, i have a read replica attached to the same.
I would like to my big query connected to Cloud SQL. Is it possible to connect Google Big Query with Cloud SQL Read replica?
Yes, it is possible.
To make queries in BigQuery over data residing in Cloud SQL you can use Federated Queries which are queries for data not residing in BigQuery, but registered as an external Data Source.
To perform these queries you can use the following syntax:
SELECT * FROM EXTERNAL_QUERY(<CONNECTION_ID>, <EXTERNAL_DATABASE_QUERY>);
The CONNECTION_ID is the one given in Big Query when creating the external datasource connection with the following steps:
Go to the Big Query Console
Click on +Add Data and select external data source
A menu will appear on the right side of your window, fill the form there with the data of your cloud SQL read replica instance.
On connection ID select a string that you can remember as it will be the one used for the federated queries
Create Connection
These steps will allow you to create the connection between Big Query and Cloud SQL. Once the connection is created you can perform federated queries to consult data from cloud SQL instances.
The EXTERNAL_DATABASE_QUERY is the query you would have used in CloudSQL to get this data.
You can use Cloud SQL as External Data Source in BigQuery
Related
We have production databases (postgresql and mysql) on Cloud SQL.
How could I export the data from the production databases, and then append to BigQuery datasets?
I DO NOT want to sync or replicate the data into BigQuery because we purge (after backing up) the production databases on regular basis.
The only method I could think of is:
Export to CSV and then drop into Google Cloud Storage
Python scrip to append into BigQuery.
Are there any other more optimal ways?
BigQuery supports external data sources, specifically federated queries which allow you to read data directly from a Cloud SQL instance.
You can use this feature to select from all the relevant tables in your Postgres/MySQL instances and copy them into BigQuery without any extra ETL process. You can append the data to your existing tables, create a new table every time, or use some other organization that works for you.
BigQuery also supports scheduled queries so you can automate this.
The actual SQL will depend on your data sources but it's not much more than...
INSERT INTO `your_bq_table`
SELECT *
FROM `external.postgres123.tablename`
I need your suggestions on the following Scenario:
USE CASE
I have an on-premises MySQL-Database (20 Tables) and I need to transfer/sync certain Tables(6 Tables) from this Database to BigQuery for Reporting.
Solution:
1- Transfer the whole Database to Cloud SQL using Database Migration Service DMS and then connect the Cloud-SQL instance with BigQuery and query the needed Tables for reporting.
2- Use Dataflow pipeline with pub/sub : How do I move data from MySQL to BigQuery?
Any suggestion How to Syn some Tables to BigQuery without migrating the Whole Database?
Big Thanks!
I am able to discover Bigquery datasets,GCS files in Google Data Catalog but I could not find Cloud SQl or Cloud Spanner options in Cloud Data Catalog UI.
Is it possible to view Cloud SQL tables , Cloud Spanner tables data in Data Catalog? If yes please suggest steps or provide documents links.
Thanks.
Yes, It is Possible using Data Catalog custom entries.
To view Cloud SQL tables, you can use the open source connectors for MySQL, SQL Server and PostgreSQL.
Also check the on-premise ingestion use cases from the official docs.
Yes, It is Possible
Details:
Other than Native-Metadata types GCS, PUB/SUB and BigQuery are needs to be dealt via Catalog-APIs
Ref: https://cloud.google.com/data-catalog/docs/how-to/custom-entries
ie,
Use One of 7 languages to programatically loop-through all the tables from custom data source (eg..BigTable) and create Tag Dynamically.
My Favorite Python & C#
Much appreciated if anyone else has better alternative approach
Unfortunately, there is no a native integration between Data Catalog, Cloud SQL and Cloud Spanner. Nevertheless, there is an issue tracker regarding this feature reported.
As you can see in the shared link, as a work around you can manually create a JDBC connector to Spanner and export the metadata to Data Catalog custom entries on a schedule. Something like the Mahendren's suggestion. Something similar you can perform with Cloud SQL
I'm trying to get data from my google cloud spanner into a web page. But i'm not familiar with cloud spanner. I created some tables and add data into them. Now i want to get them to my own web page
Please follow this tutorial, where it is explained step by step the process of working with cloud spanner with php.
1.Create a Cloud Spanner instance and database.
2.Write, read, and execute SQL queries on data in the database.
3.Update the database schema.
4.Update data using a read-write transaction.
5.Add a secondary index to the database.
6.Use the index to read and execute SQL queries on data.
7.Retrieve data using a read-only transaction.
Can we execute sql query inside DMS task so that it just fetches the required data and not the whole db.
If its not possible then which aws service is used to fetch query based data from on-prem data source to aws S3.
You can use filters and/or exclude fields: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.html
Contact me if you have problems.
For alternate solution to DMS, you can use AWS Glue with data retrieved using PYSPARK dataframe from on prem DB to either s3 and AWS RDS. This works very well. The only down side is the cost.
This solution supports both table and SQL as input for data extraction