We are currently using AWS RDS as our databases. In tables, we defined some insert or update triggers on tables. I would like to know if Bigquery also support triggers?
thanks
BigQuery is a data warehouse product, similar to AWS Redshift and AWS Athena and there is no trigger support.
If you used AWS RDS so far, you need to check Google CloudSQL.
Google Cloud SQL is an easy-to-use service that delivers fully managed
SQL databases in the cloud. Google Cloud SQL provides either MySQL or
PostgreSQL databases.
If you have a heavy load, then check out Google Cloud Spanner it's even better for full scalable relational db.
Cloud Spanner is the only enterprise-grade, globally-distributed, and
strongly consistent database service built for the cloud specifically
to combine the benefits of relational database structure with
non-relational horizontal scale.
Big Query doesn't have the feature as stated by the colleague above.
However it has an event api based on it's audit logs. You can inspect it and trigger events with cloud functions as per:
https://cloud.google.com/blog/topics/developers-practitioners/how-trigger-cloud-run-actions-bigquery-events
Regards
Related
I need to perform a periodic data purge and data load operations between GCP BigQuery and GCP CloudSql.
This involves running multiple queries in GCP BigQuery and GCP cloud SQL in a predetermined sequence and using the query results from earlier queries in subsequent queries
I am considering a few options as described below
Option1 : Use BigQuery "Scheduled Queries" that uses "Federated query" (https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries). This is good as far job involves trigger "read only" queries in gcp cloud SQL database and run multiple queries in gcp BiqQuery.
However since my operation involves purging data from GCP cloud SQL so federated queries are ruled out.
Options 2: Another option I am considering is to use a gcp compute linux engine VM as my controller for performing operations that span across a gcp cloud SQL mysql database and GCP bigQuery.
I can run a cron job to schedule the operation
As far as running gcp cloud SQL queries from a gcp compute engine VM goes, that is well documented by google in a tutorial "Connect from a VM instance" ( Learn how to connect your Cloud SQL instance from a Compute Engine VM instance)
And , for trigering the gcp Big Query queries, bq command line tool (https://cloud.google.com/bigquery/docs/bq-command-line-tool) provides a good option.
This should allow me to run a sequence of interlaced BigQuery and gcp cloud SQL
Do you see any gotchas in "option 2" described above that I am contemplating.
Is there any other option that you can suggest ? I wonder if cloud dataflow is an appropriate solution for a task that involves running queries across multiple databases ( cloud SQL and BigQuery in this case) and using the intermediate query results in subsequent queries
Thinking about your option 2, I probably would consider DataFlow, Cloud Functions, Cloud Run, as the main backbone for those operations instead of VMs. You might find serverless solutions much cheaper and more reliable depending on your wider context, and "how frequently" the "periodic" process is to run.
On the other hand, if you (or your company) has already relevant experience in "some code" on VM, but no skills, knowledge and experience in the serverless solutions, the "education overhead" can increase the overall cost of this path.
To orchestrate your queries, you need an orchestrator. There are 2 on Google Cloud:
The big one Cloud Composer, entreprise grade (feature and cost!)
The new one: Cloud Workflows. Serverless and easy to use
I recommend you to use Cloud Workflows. You can create your workflow of call that you perform to BigQuery (federated queries or not)
If you need to update/delete data in Cloud SQL, I recommend you to create a proxy Cloud Functions (or Cloud Run) that take in parameter a SQL query and execute it on Cloud SQL.
You can call your Cloud FUnctions (or Cloud Run) with Workflow, it's only a HTTP call to perform.
Workflow has also the feature to handle and to propose some processing capacity on the answer gather from your API Calls. So you can parse the response, iterate on it and call the subsequent step, even by injecting some data coming from previous steps.
I am trying to migrate data from GCP cloud sql server to AWS aurora MySQL using DMS CDC, for this I need to enable CDC on DMS source database which cloud SQL. As per the AWS documentation I need to enable CDC by executing "sp_cdc_enable_db" SP and for this I need sysadmin access but google cloud doesn't support sysadmin access. So, in this scenario how to enable CDC?
As you know, Cloud SQL doesn't support sysadmin access, and CDC feature too.
So you have to use different method to the migration process.
If you really want to use CDC, I recommend to use middle-man replication between GCP cloud sql and AWS aurora mysql.
Just replicate your cloud sql to on-premise or somewhere else where you can execute cdc.
And then migrate the SQL server replication to AWS aurora using AWS DMS, But aurora will not be synced with source DB in cloud sql.
Or if you just want to both DBs are in sync, Have you tried steps AWS document described in here?
I think "Migrating existing data and replicating ongoing changes" section worked exactly you want.
There seems to be an increasing overlapping and proliferation of cloud database technologies.
In order to make sense of it a comparative approach might help.
What are the exact differences between Google Cloud Firestore vs Google Cloud Spanner ?
Cloud Firestore is:
A flexible, NoSQL (non-relational) scalable database for mobile, web, and server development from Firebase and Google Cloud Platform.
On the other hand, Cloud Spanner:
Horizontally scalable, strongly consistent, relational database service.
So the main difference between them is that one is a non-relational database while the other is relational. Furthermore, Cloud Firestore is also a real-time database, which means that for every change that takes place in the database you are instantly notified.
Cloud Firestore is a fast, fully managed, serverless, cloud-native NoSQL document
database that simplifies storing, syncing, and querying data for your mobile, web, and
IoT apps at global scale. Its client libraries provide live synchronization and offline
support, and its security features and integrations with Firebase and GCP accelerate
building truly serverless apps.
Cloud Firestore supports ACID transactions, with automatic multi-region replication and strong consistency, your data is safe
and available, even when disasters strike. Cloud Firestore even allows you to run
sophisticated queries against your NoSQL data without any degradation in
performance.
Cloud Spanner is a service built for the cloud specifically to combine the benefits of
relational database structure with non-relational horizontal scale.
This service can provide petabytes of capacity and offers transactional consistency at
global scale, schemas, SQL, and automatic, synchronous replication for high
availability. Use cases include financial applications and inventory applications
traditionally served by relational database technology.
right now i am using microsoft sql server for my database in my dev app. in future if i want to migrate my database in google spanner what guidelines should i follow from right now so then migration should be easy in feature. also does google provide any migration tools like Microsoft® Data Migration Assistant.
SYNOPSIS
gcloud spanner instances create INSTANCE --config=CONFIG
--description=DESCRIPTION --nodes=NODES [--async] [GLOBAL-FLAG ...]
does spanner has any local emulator so i can install it in my local machine and test it before
gcloud spanner instances create --help
Cloud spanner is Google's horizontally scalable relational database. It is quite expensive(running it in minimal configuration with 3 nodes would cost you at least 100$ daily). Unless you really need the horizontal scalability you should use Cloud SQL.
Cloud SQL is a managed MySQL or PostgreSQL service by Google. You can migrate your data to Cloud SQL easily as this is a common use case. How you do it depends on your choice. For example check this question for exporting it to MySql. You can check this link to convert to PostgreSQL.
Check the Google's decision tree if you are unfamiliar with the details of Google's storage options:
Let's say a company has an application with a database hosted on AWS and also has a read replica on AWS. Then that same company wants to build out a data analytics infrastructure in Google Cloud -- to take advantage of data analysis and ML services in Google Cloud.
Is it necessary to create an additional read replica within the Google Cloud context? If not, is there an alternative strategy that is frequently used in this context to bridge the two cloud services?
While services like Amazon Relational Database Service (RDS) provides read-replica capabilities, it is only between managed database instances on AWS.
If you are replicating a database between providers, then you are probably running the database yourself on virtual machines rather than using a managed service. This means the databases appear just like any resource on the Internet, so you can connect them exactly the way you would connect two resources across the internet. However, you would be responsible for managing, monitoring, deploying, etc. This takes away from much of the benefit of using cloud services.
Replicating between storage services like Amazon S3 would be easier since it is just raw data rather than a running database. Also, Big Data is normally stored in raw format rather than being loaded into a database.
If the existing infrastructure is on a cloud provider, then try to perform the remaining activities on the same cloud provider.