AWS Document DB MultiRegion replication - amazon-web-services

I know that the AWS DocumentDB doesnot support multiregion replication even the snapshot cannot be shared across regions.
Please suggest how we can manually do the replication

Sam,
AWS released cross-region snapshot copy today (7/10/20), so that should get you what you need. Good luck.
https://aws.amazon.com/about-aws/whats-new/2020/07/amazon-documentdb-support-cross-region-snapshot-copy/

Thank you for the feedback, Sam. A couple of options are to use change streams + Lambda/worker to replicate the data or to take take backups to S3 with mongodump and utilize S3's cross-region replication capabilities.

Amazon DocumentDB now supports global clusters. The primary cluster supports writes and read-scaling, up to 5 regions can be added as read-only, see https://aws.amazon.com/documentdb/global-clusters/

Related

Elasticsearch Cross Cluster Replication (CCR) on Amazon AWS and MS Azure

I am wondering if there is a support for elastic search cross cluster replication on AWS and Azure?
I see AWS announcement where said that they are going to support cross cluster search (don't sure that is tis related to my query though).
Could you please advice if it is supported or are there any news pointing that it might be supported in the nearest future?
Highly appreciate any help.
AWS Elasticsearch doesn't have CCR(Cross Cluster Replication).
However you can achieve (depends on your RTO & RPO) the same by taking manual snapshots into your own S3 there by replicating the data into new Domain.
For more info on manual snapshots , you can go through here
At last Amazon releases new opensearch service for elasticsearch cross cluster replication. Here is announce - https://aws.amazon.com/about-aws/whats-new/2021/10/amazon-opensearch-service-amazon-elasticsearch-service-cross-cluster-replication/

Migrate data from AWS DocumentDB account to another Document DB account?

I have two different accounts in AWS and I need to move/copy the data from one DocumentDB in one account to the other account.
Some of you knows how to do this task? I am thinking about do it programmatically but I am not sure if this is a good idea.
Thank you in advance for your help.
DocumentDB allows you to share a cluster snapshot with different AWS account in the same region. You can find more information on it here:
https://docs.aws.amazon.com/documentdb/latest/developerguide/backup-restore.db-cluster-snapshot-share.html

Multiple Hashicorp Vault servers in different AZs in AWS

I have 3 Availability Zones in my AWS VPC and I would like to run Vault to connect to S3. I would like to run 3 Vault servers (one for each zone) all of them syncing to the same S3 bucket. Is this HA scenario for Vault possible?
I read that Vault doesn't support HA using S3 as the backend and might need to use Consul (which runs 3 servers by default). A bit confused about this. All I want is to run multiple Vault servers all storing/reading secrets from the same S3 bucket.
Thanks for your inputs.
Abdul
Note you could use DynamoDB to use an Amazon managed service & get HA support:
High Availability – the DynamoDB storage backend supports high availability. Because DynamoDB uses the time on the Vault node to implement the session lifetimes on its locks, significant clock skew across Vault nodes could cause contention issues on the lock.
https://www.vaultproject.io/docs/configuration/storage/dynamodb.html
There are several Storage Backends in Vault, and only some of them supports HA, like Consul. However, if a backend doesn't support HA it doesn't mean that it can't be used at all.
So, if you need to run multiple Vault istance, each one independent from the other ones, you should be able to use S3 as a Storage Backend. But if you need HA you need to use Consul, or any other backend that support HA.
Hope this help

AWS EMR migration from us-east to us-west

I am planning to move an emr cluster from us-east to us-west. I have data residing in hdfs as well as s3. But due to lack of proper documentation i am unable to start with this thing.
Does anyone has any experience in doing so ?
You can use s3-dist-cp tool on EMR to copy data from HDFS to S3 and later , you can use the same tool to copy from S3 to HDFS on the cluster in different region. Also note that its always recommended to use EMR with s3 buckets on same region.

Uploading File to S3, then process in EMR and last transfer to Redshift

I am new in this forum and technology and looking for your advice. I am working on POC and below are my requirement. Could you please guide me the way to achieve the result.
Copy data from NAS to S3.
Use S3 as a source in EMR Job with target to S3/Redshift.
Any link, pdf will also helpful.
Thanks,
Pardeep
There's a lot here that you're asking and there's not a lot of info on your use case to go by so I'm going to be very general in my answer and hopefully it at least points you in the right direction.
You can use Lambda to copy data from your NAS to S3. Assuming your NAS is on-premise and assuming you have a VPN into your VPC or even Direct Connect configured, then you can use a VPC enabled Lambda function to read from the NAS on-premise and write to S3.
If your NAS is running on EC2 the above will remain the same except there's no need for VPN or Direct Connect.
Are you looking to kick off the EMR job from Lambda? You can use S3 as a source for EMR to then output to S3 either from within Lambda or via other means as well.
If you can provide more info on your use case we could probably give you a better quality answer.
Copy data from NAS to S3.
Really depends on the amount of data and the frequency on which you run the copy job. If the data in GBs, then you can install AWS CLI on a machine where NFS is attached. AWS CLI command like CP can be multithreaded and can easily copy your datasets to S3. You might also enable S3 transfer acceleration to speed things up. Having AWS Direct connect to your company network can also speed up any transfers from on-premis to AWS.
http://docs.aws.amazon.com/cli/latest/topic/s3-config.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration.html
https://aws.amazon.com/directconnect/
If the data is in TBs (which is probably distributed across multiple volumes), then you might have to consider using physical transfer utilities like AWS Snowball,AWSImportExport or AWS Snowmobile based on the use-case.
https://aws.amazon.com/cloud-data-migration/
Use S3 as a source in EMR Job with target to S3/Redshift.
Again, as there are lot of applications on EMR, there are lot of choices. Redshift supports COPY/UNLOAD commands to S3 which any application can make use of. If you want to use SPARK on EMR , then installing databricks spark-redshift driver is a viable option for you.
https://github.com/databricks/spark-redshift
https://databricks.com/blog/2015/10/19/introducing-redshift-data-source-for-spark.html
https://aws.amazon.com/blogs/big-data/powering-amazon-redshift-analytics-with-apache-spark-and-amazon-machine-learning/