Simplest way to get data from AWS mysql RDS to AWS Elasticsearch? - amazon-web-services

I have data in an AWS RDS, and I would like to pipe it over to an AWS ES instance, preferably updating once an hour, or similar.
On my local machine, with a local mysql database and Elasticsearch database, it was easy to set this up using Logstash.
Is there a "native" AWS way to do the same thing? Or do I need to set up an EC2 server and install Logstash on it myself?

You can achieve the same thing with your local Logstash, simply point your jdbc input to your RDS database and the elasticsearch output to your AWS ES instance. If you need to run this regularly, then yes, you'd need to setup a small instance to run Logstash on it.
A more "native" AWS solution to achieve the same thing would include the use of Amazon Kinesis and AWS Lambda.
Here's a good article explaining how to connect it all together, namely:
how to stream RDS data into a Kinesis Stream
configuring a Lambda function to handle the stream
push the data to your AWS ES instance

Take a look at Amazon DMS. Its usually used for DB migrations, however, it also supports continuous data replication. This might simplify the process and be cost-effective.
You can use AWS Database Migration Service to perform continuous data replication. Continuous data replication has a multitude of use cases including Disaster Recovery instance synchronization, geographic database distribution and Dev/Test environment synchronization. You can use DMS for both homogeneous and heterogeneous data replications for all supported database engines. The source or destination databases can be located in your own premises outside of AWS, running on an Amazon EC2 instance, or it can be an Amazon RDS database. You can replicate data from a single database to one or more target databases or data from multiple source databases can be consolidated and replicated to one or more target databases.
https://aws.amazon.com/dms/

Related

Moving data from AWS Aurora MySQL to another AWS Aurora MySQL with AWS Glue

I have an AWS Aurora MySQL database on my production environment, and a separate AWS Aurora MySQL database on my performance environment. Periodically, I'll create a copy the production database, and use the copy as my database in my Performance environment, switching out the old performance database and replacing it with the new one.
Does AWS Glue provide the ability to move data from one Aurora MySQL database to another Aurora MySQL database? Could I use it to periodically (maybe once a week) copy over data from the Prod database to the Perf database? Also, if this is possible, would I be able to selectively copy data over from the prod MySQL, without necessarily losing data that was only added on the perf MySQL?
May I suggest not to use Glue for a full copy of a database, but AWS DMS (Database Migration Service) instead.
You can do very quick 1-to-1 migrations between two databases with DMS. You spin a DMS instance (Linux server, low cost, turn it off when not in use), set up a source and a target endpoint, and a replication task, and you're good to go.
Here is a guide you can follow: https://docs.aws.amazon.com/dms/latest/sbs/chap-rdsoracle2aurora.html

AWS Aurora Read Replicas

We have a couple of microservices running in AWS with help of EKS. So far the data was stored in Oracle database which is on premises. We had a Kafka topic in between microservices and Oracle db.
Now we plan to move to AWS Aurora db and have database also in cloud. So far our micro-services(implemented in spring boot) do not have any cloud specific code meaning AWS SDK integration. So our codebase was cloud agnostic and we plan to keep it that way.
Now we plan to write a new micro-service which will interact with Aurora, so any other service, if they want to read/write data to Aurora, they will call this new service.
For this new micro-service, do we need to use it with AWS SDK? If we just use Aurora URL for data operations, will there be any performance impact as compared to using AWS SDK and using Aurora APIs to deal with data storage/retrieval.
We plan to have one master Aurora Db and 2 read replicas. As I understand write operations will be redirected to master and read operations are redirected to internal LB for read replicas.
do we need to use it with AWS SDK
For management operations only. You can't used AWS SDK nor AWS API to actually read and write data from Aurora (you can only do this for Aurora Serverless with Data API).
If we just use Aurora URL for data operations
There is no other choice. You have to use Aurora endpoints.
As I understand write operations will be redirected to master and read operations are redirected to internal LB for read replicas.
Not automatically. Your write operations must explicit be directed to the writer endpoint. This means that in your app you have to use write endpoint when you write, and reader endpoint when you want to read.
Load balancer for replicas is at connection level, not at operation level. But yes, each connection to a reader endpoint will go to a random replica.

AWS to non AWS SSH

I am setting up the architecture for an AWS project and I am pondering which service of AWS to use.
I have some data stored in RDS(MySQL or Oracle) in AWS. The use case demands to ssh the data from RDS to a non-aws instance. As the data is stored in RDS, I need to send some formatted/massaged data to a client(non-aws instance) via ssh by either enabling the ssh channel from the RDS (EC2) instance - which I don't prefer or using something else from the AWS umbrella-like lambda functions. The data that I need to ssh will be in csv format in sizes of KB's or in small MB's so I don't a big ETL tool for doing this.
The data in RDS will be populated via AWS Lambda.
Spinning up a separate EC2 instance just for this (to ssh to the client) will really be a kill.
What are the options I have?
You can always take advantage of aws serverless umbrella.
So if you want to massage the data and then ssh to non aws instance you can use aws glue and do your processing and also orchestrate it using glue workflows.

How to use AWS DMS from a region to an other?

I am trying to use AWS DMS to move data from a source database ( AWS RDS MySQL ) in the Paris region ( eu-west-3 ) to a target database ( AWS Redshift ) in the Ireland region ( eu-west-1 ). The goal is to continuously replicate ongoing changes.
I am running into these kind of errors :
An error occurred (InvalidResourceStateFault) when calling the CreateEndpoint operation: The redshift cluster datawarehouse needs to be in the same region as the current region. The cluster's region is eu-west-1 and the current region is eu-west-3.
The documentation says :
The only requirement to use AWS DMS is that one of your endpoints must
be on an AWS service.
So what I am trying to do should be possible. In practice, it's seems it's not allowed.
How to use AWS DMS from a region to an other ?
In what region, should my endpoints be ?
In what region, should my replication task be ?
My replication instance has to be on the same region than the RDS MySQL instance because they need to share a subnet
AWS provides this whitepaper called "Migrating AWS Resources to a New AWS Region", updated last year. You may want to contact their support, but an idea would be to move your RDS to another RDS in the proper region, before migrating to Redshift. In the whitepaper, they provide an alternative way to migrate RDS (without DMS, if you don't want to use it for some reason):
Stop all transactions or take a snapshot (however, changes after this point in time are lost and might need to be reapplied to the
target Amazon RDS DB instance).
Using a temporary EC2 instance, dump all data from Amazon RDS to a file:
For MySQL, make use of the mysqldump tool. You might want to
compress this dump (see bzip or gzip).
For MS SQL, use the bcp
utility to export data from the Amazon RDS SQL DB instance into files.
You can use the SQL Server Generate and Publish Scripts Wizard to
create scripts for an entire database or for just selected objects.36
Note: Amazon RDS does not support Microsoft SQL Server backup file
restores.
For Oracle, use the Oracle Export/Import utility or the
Data Pump feature (see
http://aws.amazon.com/articles/AmazonRDS/4173109646282306).
For
PostgreSQL, you can use the pg_dump command to export data.
Copy this data to an instance in the target region using standard tools such as CP, FTP, or Rsync.
Start a new Amazon RDS DB instance in the target region, using the new Amazon RDS security group.
Import the saved data.
Verify that the database is active and your data is present.
Delete the old Amazon RDS DB instance in the source region
I found a work around that I am currently testing.
I declare "Postgres" as the engine type for the Redshift cluster. It tricks AWS DMS into thinking it's an external database and AWS DMS no longer checks for regions.
I think it will result in degraded performance, because DMS will probably feed data to Redshift using INSERTs instead of the COPY command.
Currently Redshift has to be in the same region as the replication instance.
The Amazon Redshift cluster must be in the same AWS account and the
same AWS Region as the replication instance.
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Redshift.html
So one should create the replication instance in the Redshift region inside a VPC
Then use VPC peering to enable the replication instance to connect to the VPC of the MySQL instance in the other region
https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html

Does your Amazon Redshift database need be in the same region as your Machine Learning model?

When trying to use Amazon Redshift to create a datasource for my Machine Learning model, I encountered the following error when testing the access of my IAM role:
There is no '' cluster, or the cluster is not in the same region as your Amazon ML service. Specify a cluster in the same region as the Amazon ML service.
Is there anyway around this, as this would be a huge pain since all of our development team's data is stored in a region that Machine Learning doesn't work in?
That's an interesting situation to be in.
What probably you can do :
1) Wait for Amazon Web Services to support AWS ML in your preferred Region. (That's a long wait though).
2) OR what else you can do is Create a backup plan for your Redshift data.
Amazon Redshift provides you some by Default tools to back up your
cluster via snapshot to Amazon Simple Storage Service (Amazon S3).
These snapshots can be restored in any AZ in that region or
transferred automatically to other regions wherever you want (In your
case where your ML is running).
There is (Probably) no other way around to use your ML with Redshift being in different regions.
Hope it will help !