We have production databases (postgresql and mysql) on Cloud SQL.
How could I export the data from the production databases, and then append to BigQuery datasets?
I DO NOT want to sync or replicate the data into BigQuery because we purge (after backing up) the production databases on regular basis.
The only method I could think of is:
Export to CSV and then drop into Google Cloud Storage
Python scrip to append into BigQuery.
Are there any other more optimal ways?
BigQuery supports external data sources, specifically federated queries which allow you to read data directly from a Cloud SQL instance.
You can use this feature to select from all the relevant tables in your Postgres/MySQL instances and copy them into BigQuery without any extra ETL process. You can append the data to your existing tables, create a new table every time, or use some other organization that works for you.
BigQuery also supports scheduled queries so you can automate this.
The actual SQL will depend on your data sources but it's not much more than...
INSERT INTO `your_bq_table`
SELECT *
FROM `external.postgres123.tablename`
Related
I am trying to build AWS QuickSight reports using AWS Athena that builds the specific views for said reports. however, I seem to only be able to select a single table in creating the Glue job despite being able to select all tables i need for the crawler of the entire DB from Dynamo.
What is the simplest route to get a complete extract of all tables that is queryable in Athena.
I dont want to connect the reports direct to dynamoDB as it s a production database and want to create some separation to avoid any performance degradation by a poor query etc.
I have a use case where I need to sync spanner table with Big Query tables. So I need to update the Spanner tables based on the updated data in Big Query tables. I am planning to using Cloud data fusion for this. But I do not see any example available for this scenario. Any pointers on this?
I have some tables to load from big query to Postgre cloud sql database. I need to do this everyday and create some stored procedures in cloud sql. What is the best way to load tables from big query to cloud sql everyday? What are the costing implications for transferring the data and keeping cloud sql on 24/7? Appreciate your help.
Thanks,
J.
Usually, a Cloud SQL database is up full time to serve request anytime. It's not a serverless product that can start when a request comes in. You can have a look to the pricing page to calculate the cost (mainly: CPU, Memory and Storage. Size database according to your usage and expected performances)
About the process, we did that in my previous company:
Use a cloud scheduler to trigger a Cloud Functions
Create temporary table in BigQuery
Export BigQuery temporary tables to CSV in Cloud Storage
Run a Cloud SQL import of the files from GCS in temporary tables
Run a request in database to merge the imported data to the existing one, and to delete the table of imported data
If it takes too much time to perform that in only one functions, you can use Cloud Run (60 minutes of time out), or a dispatch functions. This functions is called by the Cloud Scheduler and will publish a message in PubSUb for each table to process. On PubSub, you can plug a Cloud Functions (or a Cloud Run) that will perform the previous process only on the table mentioned in the message. Like that, you process concurrently all the tables and not sequentially.
About cost you will pay
BigQuery query (volume of data that you process to create temporary tables)
BigQuery storage (very low, you can create temporary table that expire (automatically deleted) after 1h)
Cloud Storage storage (very low, you can set a lifecycle on the file, to delete them after few days)
File transfer: free if you stay in the same region.
Export and import: free
In summary, only the BigQuery query and the Cloud SQL instance are major costs.
I need your suggestions on the following Scenario:
USE CASE
I have an on-premises MySQL-Database (20 Tables) and I need to transfer/sync certain Tables(6 Tables) from this Database to BigQuery for Reporting.
Solution:
1- Transfer the whole Database to Cloud SQL using Database Migration Service DMS and then connect the Cloud-SQL instance with BigQuery and query the needed Tables for reporting.
2- Use Dataflow pipeline with pub/sub : How do I move data from MySQL to BigQuery?
Any suggestion How to Syn some Tables to BigQuery without migrating the Whole Database?
Big Thanks!
I am able to discover Bigquery datasets,GCS files in Google Data Catalog but I could not find Cloud SQl or Cloud Spanner options in Cloud Data Catalog UI.
Is it possible to view Cloud SQL tables , Cloud Spanner tables data in Data Catalog? If yes please suggest steps or provide documents links.
Thanks.
Yes, It is Possible using Data Catalog custom entries.
To view Cloud SQL tables, you can use the open source connectors for MySQL, SQL Server and PostgreSQL.
Also check the on-premise ingestion use cases from the official docs.
Yes, It is Possible
Details:
Other than Native-Metadata types GCS, PUB/SUB and BigQuery are needs to be dealt via Catalog-APIs
Ref: https://cloud.google.com/data-catalog/docs/how-to/custom-entries
ie,
Use One of 7 languages to programatically loop-through all the tables from custom data source (eg..BigTable) and create Tag Dynamically.
My Favorite Python & C#
Much appreciated if anyone else has better alternative approach
Unfortunately, there is no a native integration between Data Catalog, Cloud SQL and Cloud Spanner. Nevertheless, there is an issue tracker regarding this feature reported.
As you can see in the shared link, as a work around you can manually create a JDBC connector to Spanner and export the metadata to Data Catalog custom entries on a schedule. Something like the Mahendren's suggestion. Something similar you can perform with Cloud SQL