AWS Aurora Read Replicas - amazon-web-services

We have a couple of microservices running in AWS with help of EKS. So far the data was stored in Oracle database which is on premises. We had a Kafka topic in between microservices and Oracle db.
Now we plan to move to AWS Aurora db and have database also in cloud. So far our micro-services(implemented in spring boot) do not have any cloud specific code meaning AWS SDK integration. So our codebase was cloud agnostic and we plan to keep it that way.
Now we plan to write a new micro-service which will interact with Aurora, so any other service, if they want to read/write data to Aurora, they will call this new service.
For this new micro-service, do we need to use it with AWS SDK? If we just use Aurora URL for data operations, will there be any performance impact as compared to using AWS SDK and using Aurora APIs to deal with data storage/retrieval.
We plan to have one master Aurora Db and 2 read replicas. As I understand write operations will be redirected to master and read operations are redirected to internal LB for read replicas.

do we need to use it with AWS SDK
For management operations only. You can't used AWS SDK nor AWS API to actually read and write data from Aurora (you can only do this for Aurora Serverless with Data API).
If we just use Aurora URL for data operations
There is no other choice. You have to use Aurora endpoints.
As I understand write operations will be redirected to master and read operations are redirected to internal LB for read replicas.
Not automatically. Your write operations must explicit be directed to the writer endpoint. This means that in your app you have to use write endpoint when you write, and reader endpoint when you want to read.
Load balancer for replicas is at connection level, not at operation level. But yes, each connection to a reader endpoint will go to a random replica.

Related

AWS to non AWS SSH

I am setting up the architecture for an AWS project and I am pondering which service of AWS to use.
I have some data stored in RDS(MySQL or Oracle) in AWS. The use case demands to ssh the data from RDS to a non-aws instance. As the data is stored in RDS, I need to send some formatted/massaged data to a client(non-aws instance) via ssh by either enabling the ssh channel from the RDS (EC2) instance - which I don't prefer or using something else from the AWS umbrella-like lambda functions. The data that I need to ssh will be in csv format in sizes of KB's or in small MB's so I don't a big ETL tool for doing this.
The data in RDS will be populated via AWS Lambda.
Spinning up a separate EC2 instance just for this (to ssh to the client) will really be a kill.
What are the options I have?
You can always take advantage of aws serverless umbrella.
So if you want to massage the data and then ssh to non aws instance you can use aws glue and do your processing and also orchestrate it using glue workflows.

What are strategies for bridging Google Cloud with AWS?

Let's say a company has an application with a database hosted on AWS and also has a read replica on AWS. Then that same company wants to build out a data analytics infrastructure in Google Cloud -- to take advantage of data analysis and ML services in Google Cloud.
Is it necessary to create an additional read replica within the Google Cloud context? If not, is there an alternative strategy that is frequently used in this context to bridge the two cloud services?
While services like Amazon Relational Database Service (RDS) provides read-replica capabilities, it is only between managed database instances on AWS.
If you are replicating a database between providers, then you are probably running the database yourself on virtual machines rather than using a managed service. This means the databases appear just like any resource on the Internet, so you can connect them exactly the way you would connect two resources across the internet. However, you would be responsible for managing, monitoring, deploying, etc. This takes away from much of the benefit of using cloud services.
Replicating between storage services like Amazon S3 would be easier since it is just raw data rather than a running database. Also, Big Data is normally stored in raw format rather than being loaded into a database.
If the existing infrastructure is on a cloud provider, then try to perform the remaining activities on the same cloud provider.

Does your Amazon Redshift database need be in the same region as your Machine Learning model?

When trying to use Amazon Redshift to create a datasource for my Machine Learning model, I encountered the following error when testing the access of my IAM role:
There is no '' cluster, or the cluster is not in the same region as your Amazon ML service. Specify a cluster in the same region as the Amazon ML service.
Is there anyway around this, as this would be a huge pain since all of our development team's data is stored in a region that Machine Learning doesn't work in?
That's an interesting situation to be in.
What probably you can do :
1) Wait for Amazon Web Services to support AWS ML in your preferred Region. (That's a long wait though).
2) OR what else you can do is Create a backup plan for your Redshift data.
Amazon Redshift provides you some by Default tools to back up your
cluster via snapshot to Amazon Simple Storage Service (Amazon S3).
These snapshots can be restored in any AZ in that region or
transferred automatically to other regions wherever you want (In your
case where your ML is running).
There is (Probably) no other way around to use your ML with Redshift being in different regions.
Hope it will help !

Simplest way to get data from AWS mysql RDS to AWS Elasticsearch?

I have data in an AWS RDS, and I would like to pipe it over to an AWS ES instance, preferably updating once an hour, or similar.
On my local machine, with a local mysql database and Elasticsearch database, it was easy to set this up using Logstash.
Is there a "native" AWS way to do the same thing? Or do I need to set up an EC2 server and install Logstash on it myself?
You can achieve the same thing with your local Logstash, simply point your jdbc input to your RDS database and the elasticsearch output to your AWS ES instance. If you need to run this regularly, then yes, you'd need to setup a small instance to run Logstash on it.
A more "native" AWS solution to achieve the same thing would include the use of Amazon Kinesis and AWS Lambda.
Here's a good article explaining how to connect it all together, namely:
how to stream RDS data into a Kinesis Stream
configuring a Lambda function to handle the stream
push the data to your AWS ES instance
Take a look at Amazon DMS. Its usually used for DB migrations, however, it also supports continuous data replication. This might simplify the process and be cost-effective.
You can use AWS Database Migration Service to perform continuous data replication. Continuous data replication has a multitude of use cases including Disaster Recovery instance synchronization, geographic database distribution and Dev/Test environment synchronization. You can use DMS for both homogeneous and heterogeneous data replications for all supported database engines. The source or destination databases can be located in your own premises outside of AWS, running on an Amazon EC2 instance, or it can be an Amazon RDS database. You can replicate data from a single database to one or more target databases or data from multiple source databases can be consolidated and replicated to one or more target databases.
https://aws.amazon.com/dms/

AWS dynamoDB vs Elastic Beanstalk. What serves my purpose better?

Parse migration guide suggests if we move over to AWS, it recommends using Elastic Beanstalk. The more I read about AWS services, I'm thinking DynomoDB is the better choice. DynamoDB and Elastic Beanstalk both use noSQL. Does anyone know the obvious difference between the two? The ability to handle many small but frequent requests is important for my project.
DynamoDB is the ultimate, scalable noSql database system. Go with Dynamo.
It handles many small requests very well.
Contrary to what the comments say, Elastic Beanstalk is NOT a web server, and it is NOT a database. Elastic Beanstalk is an AWS service that helps users quickly provision other AWS services, such as compute (think EC2) and storage (think S3 or DynamoDB) and set up monitoring and deployment of user's application on these resources. With Beanstalk you can deploy your applications and retain control over the underlying AWS resources. In your case, you might use Elastic Beanstalk do deploy a MongoDB server database to store your parse data.
DynamoDB on the other hand is a managed, distributed highly available and scalable non-relational (NoSQL) database provided as an AWS service. Dynamo is in some ways comparable to MongoDB (they can both store data and they are both non-relational) but where Mongo is a system that you have to manage and deploy yourself (perhaps with the help of Elastic Beanstalk), Dynamo is a fully managed system where you only have to worry about your application logic. In your case you'd be replacing MongoDB with DynamoDB which will free yourself to focus on your appliction instead of having to worry about maintaining MongoDB (i.e updating it and the host OS when new releases come out, etc).