I have a use case where I need to sync spanner table with Big Query tables. So I need to update the Spanner tables based on the updated data in Big Query tables. I am planning to using Cloud data fusion for this. But I do not see any example available for this scenario. Any pointers on this?
Related
We have production databases (postgresql and mysql) on Cloud SQL.
How could I export the data from the production databases, and then append to BigQuery datasets?
I DO NOT want to sync or replicate the data into BigQuery because we purge (after backing up) the production databases on regular basis.
The only method I could think of is:
Export to CSV and then drop into Google Cloud Storage
Python scrip to append into BigQuery.
Are there any other more optimal ways?
BigQuery supports external data sources, specifically federated queries which allow you to read data directly from a Cloud SQL instance.
You can use this feature to select from all the relevant tables in your Postgres/MySQL instances and copy them into BigQuery without any extra ETL process. You can append the data to your existing tables, create a new table every time, or use some other organization that works for you.
BigQuery also supports scheduled queries so you can automate this.
The actual SQL will depend on your data sources but it's not much more than...
INSERT INTO `your_bq_table`
SELECT *
FROM `external.postgres123.tablename`
I need to query in GCP and migrate the result (1M rows) to Teradata since the BI team only has access to Teradata.
What would be the best way to complete this job?
This has to be done every month, so if there is any way to automate this, it would be perfect.
Thank you
There is an QueryGrid Connector from Teradata to BigQuery. With using this, you can access from Teradata directly objects residing in BigQuery. This would be the easiest way.
Alternatively you can use the GoogleCloud Storage as an intermediate storage, where both BigQuery and Teradata can access.
I need your suggestions on the following Scenario:
USE CASE
I have an on-premises MySQL-Database (20 Tables) and I need to transfer/sync certain Tables(6 Tables) from this Database to BigQuery for Reporting.
Solution:
1- Transfer the whole Database to Cloud SQL using Database Migration Service DMS and then connect the Cloud-SQL instance with BigQuery and query the needed Tables for reporting.
2- Use Dataflow pipeline with pub/sub : How do I move data from MySQL to BigQuery?
Any suggestion How to Syn some Tables to BigQuery without migrating the Whole Database?
Big Thanks!
I am trying to implement a BI solution using GCP where I have data in flat files in cloud datastore and I have to push this data in my Data Warehouse on BigQuery. The data will be incremental after the first load.
There doesn't seem to be any ETL functionality which I can use to implement this incremental data load into my warehouse. Using Cloud Dataflow, I can push the delta load into the BigQuery tables but this approach doesn't handle the updated records correctly.
Can anyone suggest here what could be the best approach for implementing this solution?
I want to load large size data, in google cloud bigQuery.
What are all the options at my hand (using UI and APIs) and what would be the fastest way?
TIA!
You can load data:
From Google Cloud Storage
From other Google services, such as DoubleClick and Google AdWords
From a readable data source (such as your local machine)
By inserting individual records using streaming inserts
Using DML statements to perform bulk inserts
Using a Google Cloud Dataflow pipeline to write data to BigQuery
more formats at Introduction to Loading Data into BigQuery
Loading data into BigQuery from Google Drive is not currently supported, but you can query data in Google Drive using an external table.
You can load data into a new table or partition, you can append data to an existing table or partition, or you can overwrite a table or partition. For more information on working with partitions, see Managing Partitioned Tables.
When you load data into BigQuery, you can supply the table or partition schema, or for supported data formats, you can use schema auto-detection.
Each method is fast, if your data is large, you should go with the Google Cloud Storage.
When you load data from Google Cloud Storage into BigQuery, your data can be in any of the following formats:
Comma-separated values (CSV)
JSON (newline-delimited)
Avro
Parquet
ORC (Beta)
Google Cloud Datastore backups