AWS Elasticsearch: how to load data from one cluster into another - amazon-web-services

on AWS Elasticsearch, is there convienient way to load data from one cluster into another ?
thx

You can do so by restoring a snapshot of your AWS Elasticsearch cluster, if you have enabled automated snapshot, while configuring the cluster.
You may want to take a look at the Working with Manual Index Snapshots (AWS CLI) section the in the below AWS Elasticsearch document?
Managing Amazon Elasticsearch Service Domains
Below is an excerpt:
Amazon Elasticsearch Service (Amazon ES) takes daily automated
snapshots of the primary index shards in an Amazon ES domain, as
described in Configuring Snapshots. However, you must contact the AWS
Support team to restore an Amazon ES domain with an automated
snapshot. If you need greater flexibility, you can take snapshots
manually and manage them in a snapshot repository, an Amazon S3
bucket.

Related

Does AWS Elasticsearch snapshot contains data?

I was reading upon the AWS documentation on Elasticsearch and in the latest versions they take snapshot of the AWS ES Cluster every 1 hour and store it in S3.
This can prove super useful in terms of recovery.
But I could not find if this snapshot just contains the cluster information or the data as well ?
Can someone confirm with if it stores the data as well or just the cluster information ?
Thanks !
From the AWS documentation:
On Amazon Elasticsearch Service, snapshots come in two forms: automated and manual.
Automated snapshots are only for cluster recovery. You can use them to restore your domain in the event of red cluster status or other data loss. Amazon ES stores automated snapshots in a preconfigured Amazon S3 bucket at no additional charge.
Manual snapshots are for cluster recovery or moving data from one cluster to another. As the name suggests, you have to initiate manual snapshots. These snapshots are stored in your own Amazon S3 bucket, and standard S3 charges apply. If you have a snapshot from a self-managed Elasticsearch cluster, you can even use that snapshot to migrate to an Amazon ES domain.
This will support cluster recovery for both and data migration from a manual snapshot. Any networking or configuration of the cluster from within the Elasticsearch service itself is managed entirely via the AWS API so these should be managed via infrastructure as code (such as CloudFormation or Terraform).

Migrating from one aws elasticsearch cluster to another aws elasticsearch cluster in different regions

I am trying to move data from one aws elasticsearch cluster in oregon to another aws elasticsearch cluster in N.Virginia. I have registered the repository in source ES and taken a manual snapshot to s3(in Oregon). Now i am trying to register a repository in destination ES in the same s3 location but it is not letting me do it.
Its throwing up an error that the s3 bucket should be in the same region.
I am now stuck. Can anybody suggest a method for this?
Based on the comments, the solution is to make a backup of the cluster into a bucket in Oregon, copy it from the bucket to a bucket in N.Virginia, and then restore the cluster in the N.Virginia region using the second bucket.

AWS Elastic Block Storage (EBS) Snapshot Access History

How can we know who has accessed AWS EBS snapshot which was publicly available?
I am not able to find the option in AWS management console
AWS CloudTrail would contain information about how Amazon EBS snapshots are used whithin your own AWS Account.
It is not possible to obtain information about how public/shared Amazon EBS snapshots have been used outside of your AWS Account.

How to use AWS DMS from a region to an other?

I am trying to use AWS DMS to move data from a source database ( AWS RDS MySQL ) in the Paris region ( eu-west-3 ) to a target database ( AWS Redshift ) in the Ireland region ( eu-west-1 ). The goal is to continuously replicate ongoing changes.
I am running into these kind of errors :
An error occurred (InvalidResourceStateFault) when calling the CreateEndpoint operation: The redshift cluster datawarehouse needs to be in the same region as the current region. The cluster's region is eu-west-1 and the current region is eu-west-3.
The documentation says :
The only requirement to use AWS DMS is that one of your endpoints must
be on an AWS service.
So what I am trying to do should be possible. In practice, it's seems it's not allowed.
How to use AWS DMS from a region to an other ?
In what region, should my endpoints be ?
In what region, should my replication task be ?
My replication instance has to be on the same region than the RDS MySQL instance because they need to share a subnet
AWS provides this whitepaper called "Migrating AWS Resources to a New AWS Region", updated last year. You may want to contact their support, but an idea would be to move your RDS to another RDS in the proper region, before migrating to Redshift. In the whitepaper, they provide an alternative way to migrate RDS (without DMS, if you don't want to use it for some reason):
Stop all transactions or take a snapshot (however, changes after this point in time are lost and might need to be reapplied to the
target Amazon RDS DB instance).
Using a temporary EC2 instance, dump all data from Amazon RDS to a file:
For MySQL, make use of the mysqldump tool. You might want to
compress this dump (see bzip or gzip).
For MS SQL, use the bcp
utility to export data from the Amazon RDS SQL DB instance into files.
You can use the SQL Server Generate and Publish Scripts Wizard to
create scripts for an entire database or for just selected objects.36
Note: Amazon RDS does not support Microsoft SQL Server backup file
restores.
For Oracle, use the Oracle Export/Import utility or the
Data Pump feature (see
http://aws.amazon.com/articles/AmazonRDS/4173109646282306).
For
PostgreSQL, you can use the pg_dump command to export data.
Copy this data to an instance in the target region using standard tools such as CP, FTP, or Rsync.
Start a new Amazon RDS DB instance in the target region, using the new Amazon RDS security group.
Import the saved data.
Verify that the database is active and your data is present.
Delete the old Amazon RDS DB instance in the source region
I found a work around that I am currently testing.
I declare "Postgres" as the engine type for the Redshift cluster. It tricks AWS DMS into thinking it's an external database and AWS DMS no longer checks for regions.
I think it will result in degraded performance, because DMS will probably feed data to Redshift using INSERTs instead of the COPY command.
Currently Redshift has to be in the same region as the replication instance.
The Amazon Redshift cluster must be in the same AWS account and the
same AWS Region as the replication instance.
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Redshift.html
So one should create the replication instance in the Redshift region inside a VPC
Then use VPC peering to enable the replication instance to connect to the VPC of the MySQL instance in the other region
https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html

Does your Amazon Redshift database need be in the same region as your Machine Learning model?

When trying to use Amazon Redshift to create a datasource for my Machine Learning model, I encountered the following error when testing the access of my IAM role:
There is no '' cluster, or the cluster is not in the same region as your Amazon ML service. Specify a cluster in the same region as the Amazon ML service.
Is there anyway around this, as this would be a huge pain since all of our development team's data is stored in a region that Machine Learning doesn't work in?
That's an interesting situation to be in.
What probably you can do :
1) Wait for Amazon Web Services to support AWS ML in your preferred Region. (That's a long wait though).
2) OR what else you can do is Create a backup plan for your Redshift data.
Amazon Redshift provides you some by Default tools to back up your
cluster via snapshot to Amazon Simple Storage Service (Amazon S3).
These snapshots can be restored in any AZ in that region or
transferred automatically to other regions wherever you want (In your
case where your ML is running).
There is (Probably) no other way around to use your ML with Redshift being in different regions.
Hope it will help !