We have manual queries that we can run in our databases to create partition in a table.
For ex - Alter table add partition()
But can we automate the process by some GCP feature?
No, you can't. You could automate the manual queries (Cloud Scheduler + Cloud Functions that query your Cloud SQL instance) but there is no built-in feature to manage partitions on Cloud SQL
Related
I need to perform a periodic data purge and data load operations between GCP BigQuery and GCP CloudSql.
This involves running multiple queries in GCP BigQuery and GCP cloud SQL in a predetermined sequence and using the query results from earlier queries in subsequent queries
I am considering a few options as described below
Option1 : Use BigQuery "Scheduled Queries" that uses "Federated query" (https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries). This is good as far job involves trigger "read only" queries in gcp cloud SQL database and run multiple queries in gcp BiqQuery.
However since my operation involves purging data from GCP cloud SQL so federated queries are ruled out.
Options 2: Another option I am considering is to use a gcp compute linux engine VM as my controller for performing operations that span across a gcp cloud SQL mysql database and GCP bigQuery.
I can run a cron job to schedule the operation
As far as running gcp cloud SQL queries from a gcp compute engine VM goes, that is well documented by google in a tutorial "Connect from a VM instance" ( Learn how to connect your Cloud SQL instance from a Compute Engine VM instance)
And , for trigering the gcp Big Query queries, bq command line tool (https://cloud.google.com/bigquery/docs/bq-command-line-tool) provides a good option.
This should allow me to run a sequence of interlaced BigQuery and gcp cloud SQL
Do you see any gotchas in "option 2" described above that I am contemplating.
Is there any other option that you can suggest ? I wonder if cloud dataflow is an appropriate solution for a task that involves running queries across multiple databases ( cloud SQL and BigQuery in this case) and using the intermediate query results in subsequent queries
Thinking about your option 2, I probably would consider DataFlow, Cloud Functions, Cloud Run, as the main backbone for those operations instead of VMs. You might find serverless solutions much cheaper and more reliable depending on your wider context, and "how frequently" the "periodic" process is to run.
On the other hand, if you (or your company) has already relevant experience in "some code" on VM, but no skills, knowledge and experience in the serverless solutions, the "education overhead" can increase the overall cost of this path.
To orchestrate your queries, you need an orchestrator. There are 2 on Google Cloud:
The big one Cloud Composer, entreprise grade (feature and cost!)
The new one: Cloud Workflows. Serverless and easy to use
I recommend you to use Cloud Workflows. You can create your workflow of call that you perform to BigQuery (federated queries or not)
If you need to update/delete data in Cloud SQL, I recommend you to create a proxy Cloud Functions (or Cloud Run) that take in parameter a SQL query and execute it on Cloud SQL.
You can call your Cloud FUnctions (or Cloud Run) with Workflow, it's only a HTTP call to perform.
Workflow has also the feature to handle and to propose some processing capacity on the answer gather from your API Calls. So you can parse the response, iterate on it and call the subsequent step, even by injecting some data coming from previous steps.
i have created a database in Redshift cluster now i want see the database and its tables manually instead of querying it.
Where can i see those database
create database example1;
With Redshift, there is no way to look at the data in any way except by issuing queries and commands against it. This is fairly common for most DBMS products.
AWS "recommend" the free tool Sqlworkbench/J
https://docs.aws.amazon.com/redshift/latest/mgmt/connecting-using-workbench.html
In addition you can issue commands against Redshift using the AWS management console
https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor.html
My personal favorite (as a professional developer) is to use the Jetbrains DataGrip product.
right now i am using microsoft sql server for my database in my dev app. in future if i want to migrate my database in google spanner what guidelines should i follow from right now so then migration should be easy in feature. also does google provide any migration tools like Microsoft® Data Migration Assistant.
SYNOPSIS
gcloud spanner instances create INSTANCE --config=CONFIG
--description=DESCRIPTION --nodes=NODES [--async] [GLOBAL-FLAG ...]
does spanner has any local emulator so i can install it in my local machine and test it before
gcloud spanner instances create --help
Cloud spanner is Google's horizontally scalable relational database. It is quite expensive(running it in minimal configuration with 3 nodes would cost you at least 100$ daily). Unless you really need the horizontal scalability you should use Cloud SQL.
Cloud SQL is a managed MySQL or PostgreSQL service by Google. You can migrate your data to Cloud SQL easily as this is a common use case. How you do it depends on your choice. For example check this question for exporting it to MySql. You can check this link to convert to PostgreSQL.
Check the Google's decision tree if you are unfamiliar with the details of Google's storage options:
I created the RDS instance in AWS console, and I created the table and load the SQL script. Am I able to see the table and data for this RDS instance in AWS console?
No, you cannot see the RDS data (tables, rows, etc.) in the AWS Management Console.
To see the data, you'll need the appropriate client depending on the RDS engine type. Some examples:
MySQL: MySQL Workbench
SQL Server: SQL Server Management Studio
PostgreSQL: pgAdmin
Oracle: Oracle SQL Developer
It's possible to achieve, you can use AWS Glue - https://aws.amazon.com/blogs/big-data/how-to-access-and-analyze-on-premises-data-stores-using-aws-glue/
You can actually achieve this using the RDS query editor.
Type this command:
select * from information_schema.tables;
You will have to visually search for your tables here. Look through the "table_name" column until you can identify them. Every time I've used this command, the database tables I created were either listed first or very last. It's not a perfect way to do it, but it will usually suffice, and you don't need any extra services or software to achieve it.
You can use the QueryEditor to list the tables you've created using this SQL:
select * from information_schema.tables where TABLE_SCHEMA = 'name of your database goes here';
We are currently using AWS RDS as our databases. In tables, we defined some insert or update triggers on tables. I would like to know if Bigquery also support triggers?
thanks
BigQuery is a data warehouse product, similar to AWS Redshift and AWS Athena and there is no trigger support.
If you used AWS RDS so far, you need to check Google CloudSQL.
Google Cloud SQL is an easy-to-use service that delivers fully managed
SQL databases in the cloud. Google Cloud SQL provides either MySQL or
PostgreSQL databases.
If you have a heavy load, then check out Google Cloud Spanner it's even better for full scalable relational db.
Cloud Spanner is the only enterprise-grade, globally-distributed, and
strongly consistent database service built for the cloud specifically
to combine the benefits of relational database structure with
non-relational horizontal scale.
Big Query doesn't have the feature as stated by the colleague above.
However it has an event api based on it's audit logs. You can inspect it and trigger events with cloud functions as per:
https://cloud.google.com/blog/topics/developers-practitioners/how-trigger-cloud-run-actions-bigquery-events
Regards