Load data from Big query to Postgre cloud sql database everyday - google-cloud-platform

I have some tables to load from big query to Postgre cloud sql database. I need to do this everyday and create some stored procedures in cloud sql. What is the best way to load tables from big query to cloud sql everyday? What are the costing implications for transferring the data and keeping cloud sql on 24/7? Appreciate your help.
Thanks,
J.

Usually, a Cloud SQL database is up full time to serve request anytime. It's not a serverless product that can start when a request comes in. You can have a look to the pricing page to calculate the cost (mainly: CPU, Memory and Storage. Size database according to your usage and expected performances)
About the process, we did that in my previous company:
Use a cloud scheduler to trigger a Cloud Functions
Create temporary table in BigQuery
Export BigQuery temporary tables to CSV in Cloud Storage
Run a Cloud SQL import of the files from GCS in temporary tables
Run a request in database to merge the imported data to the existing one, and to delete the table of imported data
If it takes too much time to perform that in only one functions, you can use Cloud Run (60 minutes of time out), or a dispatch functions. This functions is called by the Cloud Scheduler and will publish a message in PubSUb for each table to process. On PubSub, you can plug a Cloud Functions (or a Cloud Run) that will perform the previous process only on the table mentioned in the message. Like that, you process concurrently all the tables and not sequentially.
About cost you will pay
BigQuery query (volume of data that you process to create temporary tables)
BigQuery storage (very low, you can create temporary table that expire (automatically deleted) after 1h)
Cloud Storage storage (very low, you can set a lifecycle on the file, to delete them after few days)
File transfer: free if you stay in the same region.
Export and import: free
In summary, only the BigQuery query and the Cloud SQL instance are major costs.

Related

Export data from Google Cloud SQL, and append to BigQuery

We have production databases (postgresql and mysql) on Cloud SQL.
How could I export the data from the production databases, and then append to BigQuery datasets?
I DO NOT want to sync or replicate the data into BigQuery because we purge (after backing up) the production databases on regular basis.
The only method I could think of is:
Export to CSV and then drop into Google Cloud Storage
Python scrip to append into BigQuery.
Are there any other more optimal ways?
BigQuery supports external data sources, specifically federated queries which allow you to read data directly from a Cloud SQL instance.
You can use this feature to select from all the relevant tables in your Postgres/MySQL instances and copy them into BigQuery without any extra ETL process. You can append the data to your existing tables, create a new table every time, or use some other organization that works for you.
BigQuery also supports scheduled queries so you can automate this.
The actual SQL will depend on your data sources but it's not much more than...
INSERT INTO `your_bq_table`
SELECT *
FROM `external.postgres123.tablename`

BigQuery data transfer service for Google Play specific App data

Can Google BigQuery data transfer service allow me to transfer specific app data automatically?
For example, I have 10 apps in my Google play console, I only want to transfer to BQ within only 3 apps. Is it possible to make this work or any approach?
Also, I just read the price of doc, The monthly charge is $25 per unique Package Name in the Installs_country table.
I don't quite understand how to calculate my cost with that example.
Thank you.
For your requirement, you can download the reports in Cloud Storage of a specific app by selecting the app from Google Play Store for which you want to get the data and then send it to BigQuery using BigQuery Data Transfer Service. The cost calculation of Google Play, it is calculated as $25 per month per unique package and stored in the Installs_country table in BigQuery.
For selecting the specific app, follow the steps given below :
Go to the Play Console.
Click on Download Reports and select the type of report you want.
Under "Select an application," type and select the app for which you want to get the data.
Select the year and month for which you want to download the report.
If you are storing data in a Cloud Storage bucket then that will incur cost and the pricing for data transfer from one storage bucket to another storage bucket can be checked in this link and since you are storing and querying in BigQuery that will also be chargeable.For BigQuery pricing details you can check this documentation. You can use the Billing Calculator to calculate your costs.

Automatically start load to BigQuery when file is uploaded to bucket/cloud storage

We have a script exporting csv-files from another database and uploading them to a bucket on GCP cloud storage. Now I know there's the possibility to schedule loads into BigQuery using BigQuery Data Transfer Service but I am a bit surprised that there doesn't seem to be a solution which triggers automatically when a file-upload is finished.
Did I miss something?
You might need to handle that event (google.storage.object.finalize) by your own means.
For example, that event can trigger a cloud function (Google Cloud Storage Triggers), which can do various things - from triggering a load job, to implmenting a complex data processing (cleaning, validation, merging, etc.) while the data from the file is being loaded to the BigQuery table.

Federated BigQuery cost and performance optimization

I am writing a scheduled federated query to load my BiqQuery tables on a daily basis. BigQuery table load strategy is Overwrite. My source is a Cloud SQL database (mysql instance).
I am wondering what would be the correct approach from a performance and cost-optimization perspective in the long run to load my BigQuery tables? Should I overwrite my BigQuery tables daily with source data or should I build a logic in my federated query itself using joins to detect just the new additions in source and then add them to my BigQuery table during daily scheduled runs?
Your second idea is the way to go.
I build a logic in my federated query itself using joins to detect just the new additions in source and then add them to my BigQuery table
The less amount of data BigQuery needs to read/write the cheaper it will be.
This is an approach generally referred to as incremental

Google Cloud Datastore Billing

gcloud datastore export --namespaces="(default)" gs://${BUCKET}
Will google charge us for datastore read operations when we do datastore exports? We'd like to run nightly backups, but we don't want to get charged an arm and a leg.
Yes. It may not be huge unless your table contains lots and lots of entities.
Refer to the table for pricing details. https://cloud.google.com/datastore/pricing
Source:
Export and import operations are charged for entity reads and writes at the rates shown in the table above. If you cancel an export or import, you will be charged for operations performed up until the time that the cancel request has propagated through Cloud Datastore.
https://cloud.google.com/datastore/pricing