I need to query in GCP and migrate the result (1M rows) to Teradata since the BI team only has access to Teradata.
What would be the best way to complete this job?
This has to be done every month, so if there is any way to automate this, it would be perfect.
Thank you
There is an QueryGrid Connector from Teradata to BigQuery. With using this, you can access from Teradata directly objects residing in BigQuery. This would be the easiest way.
Alternatively you can use the GoogleCloud Storage as an intermediate storage, where both BigQuery and Teradata can access.
Related
We have production databases (postgresql and mysql) on Cloud SQL.
How could I export the data from the production databases, and then append to BigQuery datasets?
I DO NOT want to sync or replicate the data into BigQuery because we purge (after backing up) the production databases on regular basis.
The only method I could think of is:
Export to CSV and then drop into Google Cloud Storage
Python scrip to append into BigQuery.
Are there any other more optimal ways?
BigQuery supports external data sources, specifically federated queries which allow you to read data directly from a Cloud SQL instance.
You can use this feature to select from all the relevant tables in your Postgres/MySQL instances and copy them into BigQuery without any extra ETL process. You can append the data to your existing tables, create a new table every time, or use some other organization that works for you.
BigQuery also supports scheduled queries so you can automate this.
The actual SQL will depend on your data sources but it's not much more than...
INSERT INTO `your_bq_table`
SELECT *
FROM `external.postgres123.tablename`
I have a use case where I need to sync spanner table with Big Query tables. So I need to update the Spanner tables based on the updated data in Big Query tables. I am planning to using Cloud data fusion for this. But I do not see any example available for this scenario. Any pointers on this?
I need your suggestions on the following Scenario:
USE CASE
I have an on-premises MySQL-Database (20 Tables) and I need to transfer/sync certain Tables(6 Tables) from this Database to BigQuery for Reporting.
Solution:
1- Transfer the whole Database to Cloud SQL using Database Migration Service DMS and then connect the Cloud-SQL instance with BigQuery and query the needed Tables for reporting.
2- Use Dataflow pipeline with pub/sub : How do I move data from MySQL to BigQuery?
Any suggestion How to Syn some Tables to BigQuery without migrating the Whole Database?
Big Thanks!
I need to migrate 70TB data (2400 tables) from on-premises Hive to BigQuery. Initial plan is to load ORC files from Hive to Cloud Storage and then to BigQuery tables.
What is a better way achieving this through automation or any other GCP service.
I would suggest you to leverage data pipelines for the stated purpose.
Here’s some reference on how to use it - https://cloud.google.com/architecture/dw2bq/dw-bq-data-pipelines#what-is-a-data-pipeline
Also, you can explore different ways to transfer your on prem data to bigquery here - https://cloud.google.com/architecture/dw2bq/dw-bq-migration-overview
And please note that in Big query ORC is not supported. So you have to convert your ORC data into one of these 3 formats - Avro, JSON, CSV.
I have a Cloud SQL managed DB. Also, i have a read replica attached to the same.
I would like to my big query connected to Cloud SQL. Is it possible to connect Google Big Query with Cloud SQL Read replica?
Yes, it is possible.
To make queries in BigQuery over data residing in Cloud SQL you can use Federated Queries which are queries for data not residing in BigQuery, but registered as an external Data Source.
To perform these queries you can use the following syntax:
SELECT * FROM EXTERNAL_QUERY(<CONNECTION_ID>, <EXTERNAL_DATABASE_QUERY>);
The CONNECTION_ID is the one given in Big Query when creating the external datasource connection with the following steps:
Go to the Big Query Console
Click on +Add Data and select external data source
A menu will appear on the right side of your window, fill the form there with the data of your cloud SQL read replica instance.
On connection ID select a string that you can remember as it will be the one used for the federated queries
Create Connection
These steps will allow you to create the connection between Big Query and Cloud SQL. Once the connection is created you can perform federated queries to consult data from cloud SQL instances.
The EXTERNAL_DATABASE_QUERY is the query you would have used in CloudSQL to get this data.
You can use Cloud SQL as External Data Source in BigQuery