Where to find Automatic and Manual DocumentDB snapshots in S3? - amazon-web-services

I see that AWS DocumentDB is creating automatic snapshots daily and I myself can create manual snapshots from AWS Console. The documentation says that the snapshot is saved in S3 but it is not visible on S3 to me.
I basically want to move the DocumentDB data to S3 in order to propagate it further to other AWS services for monitoring purposes. I was thinking if I can trigger a manual snapshot daily and have a lambda trigger on S3 file upload by DocumentDB.
How can I see the automatic and manual snapshot created by DocumentDB on S3?

Backups in Amazon DocumentDB are stored in service-managed S3 buckets and thus there is no way to access the backups directly.
Two options here are:
1/use mongodump/mongoexport on a schedule: https://docs.aws.amazon.com/documentdb/latest/developerguide/backup_restore-dump_restore_import_export_data.html
2/use change streams to incrementally write to S3: https://docs.aws.amazon.com/documentdb/latest/developerguide/change_streams.html

Related

Access AWS CloudWatch Logs and Metrics Off-Prem

AWS allows us to capture all kinds of metrics and logs through CloudWatch. Are these data accessible outside the AWS cloud environment (assuming proper permissions and policies are set to allow it to be so)?
For example, could these data be backed up and stored on-prem?
I imagine a Lambda function could be created to access say S3 data and fetch it through the Gateway API, but are CloudWatch data stored in S3?
Log data on CloudWatch is stored in S3 which we can not access. However, you can export log to S3.
Doc says..
You can export log data from your log groups to an Amazon S3 bucket
and use this data in custom processing and analysis, or to load onto
other systems.
...
To begin the export process, you must create an S3 bucket to store the
exported log data. You can store the exported files in your Amazon S3
bucket and define Amazon S3 lifecycle rules to archive or delete
exported files automatically.
Then you can simply download from S3 or use services as you like.
The raw metrics stored in CloudWatch Metrics are not accessible.
For example, when each Amazon EC2 instance sends CPUUtilization to CloudWatch.
Instead, aggregated metrics can be queried, such as "Average CPU Utilization over a 5-minute period".
This is different to CloudWatch Logs, which can be exported to Amazon S3.

AWS S3 replication without versioning

I have enabled AWS S3 replication in an account and I want to replicate the same S3 data to another account and it all works fine. But I don't want to use S3 versioning because of its additional cost.
So is there any other way to accommodate this scenario?
The automated Same Region Replication(SRR) and Cross Region Replication(CRR) requires versioning to be activated due to the way that data is replicated between S3 buckets. For example, a new version of an object might be uploaded while a bucket is still being replicated, which can lead to problems without having separate versions.
If you do not wish to retain other versions, you can configure Amazon S3 Lifecycle Rules to expire (delete) older versions.
An alternative method would be to run the AWS CLI aws s3 sync command at regular intervals to copy the data between buckets. This command would need to be run on an Amazon EC2 instance or even your own computer. It could be triggered by a cron schedule (Linux) or a Schedule Task (Windows).

Where are stored (geographically) AWS RDS Backup

I use AWS and have automatic backup enabled.
For one of our client, we need to know exactly where the backup data is stored.
From the AWS FAQ website, I can see that:
Q: Where are my automated backups and DB Snapshots stored and how do I manage their retention?
Amazon RDS DB snapshots and automated backups are stored in S3.
My understanding is that you can have a S3 instance located anywhere you want, so it's not clear to me where this data is.
Just to be clear, I'm interested by the physical location (is it Europe, US....)
It is stored in the same AWS region where the RDS instance is located.
When you directly store data in S3, you store it in an S3 container called a bucket (S3 doesn't use the term "instance") in the AWS region you choose, and the data always remains only in that region.
RDS snapshots and backups are not something you store directly -- RDS stores it for you, on your behalf -- so there is no option to select the bucket or region: it is always stored in an S3 bucket in the same AWS region where the RDS instance is located. This can't be modified.
The data from RDS backups and snapshots is not visible to you from the S3 console, because it is not stored in one of your S3 buckets -- it is stored in a bucket owned and controlled by the RDS service within the region.
According to this :
Your Amazon RDS backup storage for each region is composed of the automated backups and manual DB snapshots for that region. Your backup storage is equivalent to the sum of the database storage for all instances in that region
I think that means that it is stored in that region only and s3 stores data like this :
Amazon S3 redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon S3 synchronously stores your data across multiple facilities before confirming that the data has been successfully stored.
https://aws.amazon.com/rds/details/backup/
...By default, Amazon RDS creates and saves automated backups of your DB instance securely in Amazon S3 for a user-specified retention period.
...Database snapshots are user-initiated backups of your instance stored in Amazon S3 that are kept until you explicitly delete them

Amazon Redshift to Glacier

I would like to backup a snapshot of my Amazon Redshift cluster into Amazon Glacier.
I don't see a way to do that using the API of either Redshift or Glacier. I also don't see a way to export a Redshift snapshot to a custom S3 bucket so that I can write a script to move the files into Glacier.
Any suggestion on how I should accomplish this?
There is no function in Amazon Redshift to export data directly to Amazon Glacier.
Amazon Redshift snapshots, while stored in Amazon S3, are only accessible via the Amazon Redshift console for restoring data back to Redshift. The snapshots are not accessible for any other purpose (eg moving to Amazon Glacier).
The closest option for moving data from Redshift to Glacier would be to use the Redshift UNLOAD command to export data to files in Amazon S3, and then to lifecycle the data from S3 into Glacier.
Alternatively, simply keep the data in Redshift snapshots. Backup storage beyond the provisioned storage size of your cluster and backups stored after your cluster is terminated are billed at standard Amazon S3 rates. This has the benefit of being easily loadable back into a Redshift cluster. While you'd be paying slightly more for storage (compared to Glacier), the real cost saving is in the convenience of quickly restoring the data in future.
Any use case to take a backup as Redshift automatically keeps snapshots. Here is a reference link

How to keep both data on aws s3 and glacier

I want to keep a backup of an AWS s3 bucket. If I use Glacier, it will archive the files from the bucket and moved to the Glacier but it will also delete the files from s3. I don't want to delete the files from s3. One option is to try with EBS volume. You can mount the AWS s3 bucket with s3fs and copy it to the EBS volume. Another way is do an rsync of the existing bucket to a new bucket which will act as a clone. Is there any other way ?
What you are looking for is cross-region replication:
https://aws.amazon.com/blogs/aws/new-cross-region-replication-for-amazon-s3/
setup versioning and setup the replication.
on the target bucket you could setup a policy to archive to Glacier (or you could just use the bucket as a backup as is).
(this will only work between 2 regions, i.e. buckets cannot be in the same region)
If you want your data to be present in both primary and backup locations then this is more of a data replication use case.
Consider using AWS Lambda which is an event driven compute service.
You can write a simple piece of code to copy the data wherever you want. This will execute every time there is a change in S3 bucket.
For more info check the official documentation.