Let's say a company has an application with a database hosted on AWS and also has a read replica on AWS. Then that same company wants to build out a data analytics infrastructure in Google Cloud -- to take advantage of data analysis and ML services in Google Cloud.
Is it necessary to create an additional read replica within the Google Cloud context? If not, is there an alternative strategy that is frequently used in this context to bridge the two cloud services?
While services like Amazon Relational Database Service (RDS) provides read-replica capabilities, it is only between managed database instances on AWS.
If you are replicating a database between providers, then you are probably running the database yourself on virtual machines rather than using a managed service. This means the databases appear just like any resource on the Internet, so you can connect them exactly the way you would connect two resources across the internet. However, you would be responsible for managing, monitoring, deploying, etc. This takes away from much of the benefit of using cloud services.
Replicating between storage services like Amazon S3 would be easier since it is just raw data rather than a running database. Also, Big Data is normally stored in raw format rather than being loaded into a database.
If the existing infrastructure is on a cloud provider, then try to perform the remaining activities on the same cloud provider.
Related
I deploy my app to AWS.
On AWS there are RDS which support some industrial standard DBMS like PostgreSQL/MySQL/Oracle.
These dbms can be make available on development machine (docker) as well, make it easy to achieve dev/prod parity.
I'm looking for a time series specialized database that I can achieve dev/prod as well.
AWS has Timestream that is specialized for time series, but I'm clueless of a local equivalent database for it.
There probably some EC2-hosted database possible, but I prefer to be lazy and have Amazon take care of manage the database cluster for me.
What options do I have?
Apache Druid is a very good Time-Series Database that can be deployed on local development environments and on multiple cloud environments easily.
Druid is ofered as a fully managed cloud service, on AWS, by Imply.
The fully managed variant of Druid is called Imply Cloud.
More information: https://imply.io/product/imply-cloud
You should try Amazon Timestream
It is as a nonrelational, fully managed service built specifically to collect, store and process time-series data. The arrival of masses of IoT data is expected to push time-series technology into wider use,that's why Amazon came with Timestream.
In general any cloud service provider, GCP in this context, is it not relevant and mandatory for Google to specifically allow consumers to choose data residency and data processing region option for all services? Else serverless option will have serious adoption issue. Please clarify.
Google Cloud have two types of the products available: that have specified location and available globally.
You can deploy resources in specific location, multi-regional for:
Compute: Compute Engine, App Engine, Google Kubernetes Engine, Cloud Functions
Storage & Databases: Cloud Storage, Bigtable, Spanner, Cloud SQL, Firestore, Memorystore, Persistent Disk...
BigData & Machine learning: BigQuery, Composer, Dataflow, Dataproc, AI training,
Networking: VPC, Cloud Load Balancing,
Developer Tools...
Following products are available only globally: Networking, Big Data Pub/Sub, Machine Learning like vision API, Management Tools, Developer Tools, IAM.
For detailed list please check Google Cloud Locations Documentation
Even if the product is available globally, for example PubSub: it is possible to specify where messages are stored.
If the data in transit are the concern, you have to be aware that Google Cloud Platform uses data encryption at Rest. It consists on several layers of encryption to protect customer data.
with Google cloud platform, cloud SQL you get a lot of options to setup the infrastructure. Does this mean cloud SQL is infrastructure as a service ?
No, the infrastucture of Cloud SQL is managed by Google and by it's engineers, so, Cloud SQL is PAAS (Plaform As A Service).
Cloud SQL is a docker container built on top of a GCE instance, and Google monitor everything for you, and fix the Cloud SQL instance automatically if something goes wrong (Sometimes Google software engineers have to perform some actions to fix some issues if the instance is stuck). So, the only thing that you have to take care of is to store and query your data.
Also, Cloud SQL offers a lot of interesting features, such as, failover replicas, read replicas, user and database adminitration, etc.
So, in Cloud SQL, Google doesn't just sell the infrastucture to create databases, but also the application itself and the monitoring tools too.
Sorry I'm new to web server. I want to deploy a cloud server for user data:
User can login using web, with verification code sent to user's phone.
User can manipulate his data (add/modify/remove) when login.
Android/iPhone client can manipulate user data when login.
Server should have a database for storage, SQLLite or others.
It would be good to use Amazon/Ali-cloud cloud service, provided it can speed up my deployment. I'm not sure if I need run into blobs such as H5, PHP/JSP, node.js or others. Can you provide a guide for me, web link or book?
And, what's the most popular programming interface between Android/IOS app and cloud server? http post/get or other wrapper ?
Surely you can speed up your deployment using Amazon Web Services. This is my recommendation:
For Webserver,
Amazon EC2: Launch an instance where you can install Apache/Nginx
here. You will need a RDS instance running parallel with your server
which will lower your need on server CPU/Mem, but will cost also.
For Database, you can have many approach ways here:
Amazon RDS: Launch an instance where you host your Database
(mysql/...). This one will provide you with Database Name, Hostname,
Users, ... which you can use to connect with your webserver in EC2.
Your Android/IOS application can use RDS information for the database
connection.
Amazon DynamoDB: Fast, Flexible for NoSQL (wonder if you want to use
traditional database or NoSQL?): https://aws.amazon.com/amplify/
For Mobile/Website access control,
AWS Cognito: Great for user-accounts, designed for real-time data
model: https://aws.amazon.com/cognito/?nc1=f_ls
For serverless if you want to GET/PUT API on your webserver for
easier,
AWS Lambda: https://aws.amazon.com/lambda/?nc1=f_ls
Taking into account that you are just starting with your application, I would suggest going with serverless architecture with AWS Lambda running your business logic.
Key benefits:
No server management = spend time on building your application vs on maintaining infrastructure
Flexible scaling = scale based on what you really need
Pay for value = don't pay for resources that you don't need
Automated high availability = serverless provides built-in availability and fault tolerance
To learn more on serverless, you may want to check Building Serverless Web Applications - 2017 AWS Online Tech Talks.
Now when it comes to going deep, I would suggest checking online trainings available from acloud.guru, cloud academy, udemy or linuxacademy for serverless and also for the development language you want to use (Node.js is often used for such scenarios).
This is not a duplicate question. I am just confused in Iaas,Saas with respect to AWS services like Dynamo, RDS, RedShift and Kinesis etc. They helps users to create database So, should we categorize them in Iaas or Saas?
Thanks
To help you understand, SaaS is Software as a Service. It's more like an on demand application where you don't have to worry about configurations, accesses, whitelisting etc. For instance, Google Maps (or Google Apps).
IaaS or Infra as a Service gives you more flexibility in terms of spawning of nodes and clusters, to deal with security services at IP and Port levels, manage access control and authentication etc. On AWS, you may specify what all private or public IPs will have access to your system, whether you prefer to go with dense storage or dense compute nodes for your warehouse, rotate your log files etc.
A page on Amazon RDS reads -
When you buy a server, you get CPU, memory, storage, and IOPS, all
bundled together. With Amazon RDS, these are split apart so that you
can scale them independently.
So, in short... Services like AWS and Azure are mostly now either IaaS or PaaS.