I have a number of databases in Azure that I want to back up in AWS, what is the best type of storage for databases in AWS ?
Can this be automated in Azure ?
In the 'old days' before Cloud Computing, back-up typically involved sending data to a secondary disaster recovery location where there was (typically inadequate) backup equipment that could takeover the activities of the primary data center.
These days, Cloud Computing provides such as AWS and Azure run multiple data centers in the one region. A 'Region' contains multiple 'Availability Zones', each of which is a separate data center.
Also, many services (eg Amazon S3, Azure Blob storage) are 'regional' services that automatically run across multiple Availability Zones. This means that a failure in one AZ does not impact operation or availability of the service. However, individual virtual machines (eg Amazon EC2, Azure VMs) run on single hosts, so each one operates in only a single AZ.
Thus, rather than attempting to copy data to a "different location" or a different cloud service, it is better to take advantage of the backup capabilities offered by the cloud provider.
From Automatic, geo-redundant backups - Azure SQL Database | Microsoft Learn:
By default, Azure SQL Database stores backups in geo-redundant storage blobs that are replicated to a paired region. Geo-redundancy helps protect against outages that affect backup storage in the primary region. It also allows you to restore your databases in a different region in the event of a regional outage.
The storage redundancy mechanism stores multiple copies of your data so that it's protected from planned and unplanned events. These events might include transient hardware failure, network or power outages, or massive natural disasters.
This would not only meet your requirement for backing up data to another location, but it also makes it quick and easy to restore data when necessary. Compare that to sending data to a different cloud provider, where you would be responsible for converting file formats, launching replacement services and loading data from backup. That type of thing really isn't necessary if you are using a managed database service.
Backing-up data is easy. Restoring is hard!
Bottom line: Use a managed database (eg Azure SQL Database) and use the managed backup options they provide. They will give you the redundancy you seek, while making the process MUCH easier to manage.
Related
I have a database hosted in my companies local data centre (source) and another cloud-hosted database (AWS RDS Postgres Online data store)
The local database (on-prem) is updated on an intraday basis (every 1-2 hours), how can I ensure that I move the new data to the RDS Database as soon as changes/updates occur in the local source database (we need this updated data from source to run specific processes/business logic on the RDS database as soon as changes occur in the source databases).
Would AWS DMS or AWS Kinesis be sufficient for this use case?
Try to implement native replication from Postgre, it would be the best method https://hevodata.com/learn/postgresql-streaming-replication/
datastore docs say:
the replication between Datastore servers. Replication is managed by
Cloud Bigtable and Megastore, the underlying technologies for
Datastore
bigtable docs say:
Replication for Cloud Bigtable enables you to increase the
availability and durability of your data by copying it across multiple
regions or multiple zones within the same region
How can I see in the datastore UI if I'm getting any replication? If I am getting replication how can I see if I'm getting cross region or cross zone replication for my datastore entities?
(The entities I'm looking at have been populated since 2017 if that's useful.)
The short answer to your question, is that if you are in a multi-region then you can already access your data from multiple regions without worrying about asynchronous replication lag.
If you are really curious about Megastore replication, you can read the Megastore paper. However, what's more likely that you want is to read the trade-offs between strong consistency & eventual consistency in Datastore.
The locations for Cloud Datastore currently match those of Cloud Firestore in either mode.
Cloud Datastore is only a regional service. You can't deploy it in multiple region in the same project.
Its brother (or sister, I don't know), Firestore, can be deployed in multi region.
So, Datastore is only mono region, but multi zonal in this unique region. And the BigTable replication mechanism is used to achieve this replication. You can't see this, it's serverless, transparent.
I have a doubt if in AWS all server-side work is done by cloud manager then why do we store backup for database?
I have studied in documentation that all the things are managed by cloud service providers for the database related things. Then what is the need of storing backup if service provider do everything for me?
You maintain your own backups of RDS instances for the same reason that you maintain offsite backups of on-premise databases: disaster recovery. In your own data center, a fire or terrorism or natural disaster could destroy both your database and your local backups. In the cloud, these disasters tend to take on a different form.
If all of your data is in any one place, then you are vulnerable to data loss in a catastrophic event, which could take a number of forms: a serious defect in the cloud provider's infrastructure (unlikely with AWS, but nothing is impossible), human error, malicious employees, a compromise of your credentials, or any other of a number of statistically-unlikely events -- the low probability of which becomes irrelevant when it occurs.
If you value your data, you back it up independently and outside of its native environment.
Amazon RDS runs a database of your choice: MySQL, PostgreSQL, Oracle, SQL Server. These are normal databases and operate in the same way as a database you would run yourself.
You are correct that a managed solution takes care of installation, maintenance and hardware issues. Also, you can configure the system to automatically take backups of the data.
From Working With Backups - Amazon Relational Database Service:
Amazon RDS creates and saves automated backups of your DB instance. Amazon RDS creates a storage volume snapshot of your DB instance, backing up the entire DB instance and not just individual databases.
Amazon RDS creates automated backups of your DB instance during the backup window of your DB instance. Amazon RDS saves the automated backups of your DB instance according to the backup retention period that you specify. If necessary, you can recover your database to any point in time during the backup retention period.
You also have the ability to trigger a manual backup. This is advisable, for example, before you do major work on the database, such as modifying schemas when upgrading an application that uses the database.
Bottom line: Amazon RDS can manage the backups for you. You do not need to take manage the backup process, but you can trigger the RDS backups yourself.
How is state managed between sessions? I know that in Azure, client-specific states are stored in SQL Azure. I'm wondering if this is done similarly in AWS?
Do the various instances of your application all access a DB somewhere where the state is stored? Is state management much different depending on which technologies you are using?
At a 'homework' level, Amazon Web Services is loosely comprised of two different sets of things:
infrastructure services (EC2, EBS), which you manage yourself
higher level services (S3, DynamoDB, ELB), which Amazon manage for you
When you upload a file to S3, it is stored across a number of machines in a number of different data centers, and Amazon is responsible for finding and returning the file when you request it (as well as making sure it doesn't get erased by a machine failure.)
With something built on top of one of the infrastructure services, such as an application running on EC2, you are on your own as to how you store and synchronize state:
One server, state in memory (bad)
Load balancing with no state handling (very bad!)
Load balancing with sticky sessions (sensible, but not enough by itself; if that server falls out of the pool, the other servers have no idea of who you are)
Load balancing with servers with a common state server
How do you store state? Traditionally a database (possibly Amazon RDS) with a memory cache (such as Elasticache - Amazon's managed memcached-compatible cache). Amazon's new DynamoDB service is a good fit for this use, as a fast, redundant, key-value store.
every once in a while i read/hear about AWS and now i tried reading the docs.
But such docs seem to be written for people who already know which AWS they need to use and only search for how it can be used.
So, for myself, to understand AWS better i try to sketch a hypothetical Webapplication with a few questions.
The apps purpose is to modify content like videos or images. So a user has some kind of webinterface where he can upload his files, do some settings and a server grabs the file and modifies it (e.g. reencoding). The Service also extracts the audio track of a video and trys to index the spoken words so the customer can search within his videos. (well its just hypothetical)
So my questions:
given my own domain 'oneofmydomains.com' is it possible to host the complete webinterface on AWS? i thought about using GWT to create the interface and just deliver the JS/images via AWS, but which one, simple storage? what about some kind of index.html, is there an EC2 instance needed to host a webserver which has to run 24/7 causing costs?
now the user has the interface with a login form, is it possible to manage logins with an AWS? here i also think about an EC2 instance hosting a database, but it would also cause costs and im not sure if there is a better way?
the user has logged in and uploads a file. which storage solution could be used to save the customers original and modified content?
now the user wants to browse the status of his uploads, this means i need some kind of ACL, so that the customer only sees his own files. do i need to use a database (e.g. EC2) for this, or does amazon provide some kind of ACL, so the GWT webinterface will be secure without any EC2?
the customers files are reencoded and the audio track is indexed. so he wants to search for a video. Which service could be used to create and maintain the index for each customer?
hope someone can give a few answers so i understand AWS better on how one could use it
thx!
Amazon AWS offers a whole ecosystem of services which should cover all aspects of a given architecture, from hosting to data storage, or messaging, etc. Whether they're the best fit for purpose will have to be decided on a case by case basis. Seeing as your question is quite broad I'll just cover some of the basics of what AWS has to offer and what the different types of services are for:
EC2 (Elastic Cloud Computing)
Amazon's cloud solution, which is basically the same as older virtual machine technology but the 'cloud' offers additional knots and bots such as automated provisioning, scaling, billing etc.
you pay for what your use (by hour), for the basic (single CPU, 1.7GB ram) would prob cost you just under $3 a day if you run it 24/7 (on a windows instance that is)
there's a number of different OS to choose from including linux and windows, linux instances are cheaper to run without the license cost associated with windows
once you're set up the server to be the way you want, including any server updates/patches, you can create your own AMI (Amazon machine image) which you can then use to bring up another identical instance
however, if all your html are baked into the image it'll make updates difficult, so normal approach is to include a service (windows service for instance) which will pull the latest deployment package from a storage (see S3 later) service and update the site at start up and at intervals
there's the Elastic Load Balancer (which has its own cost but only one is needed in most cases) which you can put in front of all your web servers
there's also the Cloud Watch (again, extra cost) service which you can enable on a per instance basis to help you monitor the CPU, network in/out, etc. of your running instance
you can set up AutoScalers which can automatically bring up or terminate instances based on some metric, e.g. terminate 1 instance at a time if average CPU utilization is less than 50% for 5 mins, bring up 1 instance at a time if average CPU goes beyond 70% for 5 mins
you can use the instances as web servers, use them to run a DB, or a Memcache cluster, etc. choice is yours
typically, I wouldn't recommend having Amazon instances talk to a DB outside of Amazon because of the round trip is much longer, the usual approach is to use SimpleDB (see below) as the database
the AmazonSDK contains enough classes to help you write some custom monitor/scaling service if you ever need to, but the AWS console allows you to do most of your configuration anyway
SimpleDB
Amazon's non-relational, key-value data store, compared to a traditional database you tend to pay a penalty on per query performance but get high scalability without having to do any extra work.
you pay for usage, i.e. how much work it takes to execute your query
extremely scalable by default, Amazon scales up SimpleDB instances based on traffic without you having to do anything, AND any control for that matter
data are partitioned in to 'domains' (equivalent to a table in normal SQL DB)
data are non-relational, if you need a relational model then check out Amazon RDB, I don't have any experience with it so not the best person to comment on it..
you can execute SQL like query against the database still, usually through some plugin or tool, Amazon doesn't provide a front end for this at the moment
be aware of 'eventual consistency', data are duplicated on multiple instances after Amazon scales up your database, and synchronization is not guaranteed when you do an update so it's possible (though highly unlikely) to update some data then read it back straight away and get the old data back
there's 'Consistent Read' and 'Conditional Update' mechanisms available to guard against the eventual consistency problem, if you're developing in .Net, I suggest using SimpleSavant client to talk to SimpleDB
S3 (Simple Storage Service)
Amazon's storage service, again, extremely scalable, and safe too - when you save a file on S3 it's replicated across multiple nodes so you get some DR ability straight away.
you only pay for data transfer
files are stored against a key
you create 'buckets' to hold your files, and each bucket has a unique url (unique across all of Amazon, and therefore S3 accounts)
CloudBerry S3 Explorer is the best UI client I've used in Windows
using the AmazonSDK you can write your own repository layer which utilizes S3
Sorry if this is a bit long winded, but that's the 3 most popular web services that Amazon provides and should cover all the requirements you've mentioned. We've been using Amazon AWS for some time now and there's still some kinks and bugs there but it's generally moving forward and pretty stable.
One downside to using something like aws is being vendor locked-in, whilst you could run your services outside of amazon and in your own datacenter or moving files out of S3 (at a cost though), getting out of SimpleDB will likely to represent the bulk of the work during migration.