I have a web application that uses two databases. I want to schedule an SQL query to be executed in set intervals to select data from the first database and put it in a table in the second one and the web application takes it's data from here.
You can do that using a scheduled Lambda function.
From Schedule jobs for Amazon RDS and Aurora PostgreSQL using Lambda and Secrets Manager
For on-premises databases and databases that are hosted on Amazon Elastic Compute Cloud (Amazon EC2) instances, database administrators often use the cron utility to schedule jobs. For example, a job for data extraction or a job for data purging can easily be scheduled using cron. For these jobs, database credentials are typically either hard-coded or stored in a properties file. However, when you migrate to Amazon Relational Database Service (Amazon RDS) or Amazon Aurora PostgreSQL, you lose the ability to log in to the host instance to schedule cron jobs. This pattern describes how to use AWS Lambda and AWS Secrets Manager to schedule jobs for Amazon RDS and Aurora PostgreSQL databases after migration.
https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/schedule-jobs-for-amazon-rds-and-aurora-postgresql-using-lambda-and-secrets-manager.html
Related
I want to periodically sync my Cloud SQL Postgres DB with an Amazon Aurora DB.
Setting up a Lambda that does pgdump and psql from one to the other should work but is there a cleaner solution?
You can take a look at the Database Migration Service in AWS. As long as one of the databases is in AWS (which Aurora is) you can use it to perform full-loads + continuous migration using Change Data Capture. PostgreSQL is supported, so you'd have to create a migration instance, configure the Postgres and Aurora endpoints and set up a replication task.
Reference
Creating tasks for ongoing replication using AWS DMS
I ended up running a pg_dump and psql cron job on an EC2 instance. I could not access GCP db as endpoint as from DMS for some reason.
I am looking to migrate some databases (in Terabytes) from oracle, MS SQL and Sybase to PostgreSQL
I came across - SCT as one of the tools which might be useful
I have a few questions regarding the viability and ability of SCT for the task:
Questions:
Is AWS SCT able to migrate databases (Data and Schema both) from Oracle, MS SQL and Sybase to PostgreSQL ?
Is AWs SCT able to migrate Triggers, Queries, Views and Stored Procedures ?
Does this migration happen on-premises or I will have upload/transfer to upload my data to AWS cloud compute or S3 instances, migrate/convert and download back ?
What is the largest database size that can be migrated using AWS SCT (Giga/Terabytes/Records) ?
How fast is the database migrated (Data per Time) ?
What is the accuracy of the migration (Percent) ?
how do I use the AWS SCT tool ? ( example: Is it available for public download? Are there any Paid licenses? Does it come bundled with some other service which we have to buy? )
I would also appreciate if there is any extra information about AWS services which might help my project.
The AWS Schema Conversion Tool only converts the schema. It does not transfer the data.
From What Is the AWS Schema Conversion Tool?:
You can use the AWS Schema Conversion Tool (AWS SCT) to convert your existing database schema from one database engine to another. You can convert relational OLTP schema, or data warehouse schema. Your converted schema is suitable for an Amazon Relational Database Service (Amazon RDS) MySQL, MariaDB, Oracle, SQL Server, PostgreSQL DB, an Amazon Aurora DB cluster, or an Amazon Redshift cluster. The converted schema can also be used with a database on an Amazon EC2 instance or stored as data on an Amazon S3 bucket.
To transfer the data, you could use the AWS Database Migration Service (AWS DMS).
From What is AWS Database Migration Service?:
AWS Database Migration Service (AWS DMS) is a cloud service that makes it easy to migrate relational databases, data warehouses, NoSQL databases, and other types of data stores. You can use AWS DMS to migrate your data into the AWS Cloud, between on-premises instances (through an AWS Cloud setup), or between combinations of cloud and on-premises setups.
The Schema Conversion Tool can convert triggers, but please consult the documentation for capabilities/limits.
The Schema Conversion Tool is free. See: Installing, verifying, and updating the AWS SCT
The Database Migration Service involves a server, so it is has associated costs. See: AWS Database Migration Service Pricing - Amazon Web Services
We need to replicate data from an Amazon RDS (MySQL) databaes to an Aurora PostgreSQL database. Each database is on a different AWS account and region.
The data must be replicated in the PostgreSQL instance every 6 hours, so we need the guarantee that the data doesn't duplicate even if a field updated for a record.
Which method, tool, or design is better to do that? (Could be different to AWS.)
You could facilitate one of the following services provided by AWS:
Database Migration Service
Glue
The AWS Database Migration Service supports:
Using an Amazon-Managed MySQL-Compatible Database as a Source for AWS DMS
Using a PostgreSQL Database as a Target for AWS Database Migration Service
I am trying to use AWS DMS to move data from a source database ( AWS RDS MySQL ) in the Paris region ( eu-west-3 ) to a target database ( AWS Redshift ) in the Ireland region ( eu-west-1 ). The goal is to continuously replicate ongoing changes.
I am running into these kind of errors :
An error occurred (InvalidResourceStateFault) when calling the CreateEndpoint operation: The redshift cluster datawarehouse needs to be in the same region as the current region. The cluster's region is eu-west-1 and the current region is eu-west-3.
The documentation says :
The only requirement to use AWS DMS is that one of your endpoints must
be on an AWS service.
So what I am trying to do should be possible. In practice, it's seems it's not allowed.
How to use AWS DMS from a region to an other ?
In what region, should my endpoints be ?
In what region, should my replication task be ?
My replication instance has to be on the same region than the RDS MySQL instance because they need to share a subnet
AWS provides this whitepaper called "Migrating AWS Resources to a New AWS Region", updated last year. You may want to contact their support, but an idea would be to move your RDS to another RDS in the proper region, before migrating to Redshift. In the whitepaper, they provide an alternative way to migrate RDS (without DMS, if you don't want to use it for some reason):
Stop all transactions or take a snapshot (however, changes after this point in time are lost and might need to be reapplied to the
target Amazon RDS DB instance).
Using a temporary EC2 instance, dump all data from Amazon RDS to a file:
For MySQL, make use of the mysqldump tool. You might want to
compress this dump (see bzip or gzip).
For MS SQL, use the bcp
utility to export data from the Amazon RDS SQL DB instance into files.
You can use the SQL Server Generate and Publish Scripts Wizard to
create scripts for an entire database or for just selected objects.36
Note: Amazon RDS does not support Microsoft SQL Server backup file
restores.
For Oracle, use the Oracle Export/Import utility or the
Data Pump feature (see
http://aws.amazon.com/articles/AmazonRDS/4173109646282306).
For
PostgreSQL, you can use the pg_dump command to export data.
Copy this data to an instance in the target region using standard tools such as CP, FTP, or Rsync.
Start a new Amazon RDS DB instance in the target region, using the new Amazon RDS security group.
Import the saved data.
Verify that the database is active and your data is present.
Delete the old Amazon RDS DB instance in the source region
I found a work around that I am currently testing.
I declare "Postgres" as the engine type for the Redshift cluster. It tricks AWS DMS into thinking it's an external database and AWS DMS no longer checks for regions.
I think it will result in degraded performance, because DMS will probably feed data to Redshift using INSERTs instead of the COPY command.
Currently Redshift has to be in the same region as the replication instance.
The Amazon Redshift cluster must be in the same AWS account and the
same AWS Region as the replication instance.
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Redshift.html
So one should create the replication instance in the Redshift region inside a VPC
Then use VPC peering to enable the replication instance to connect to the VPC of the MySQL instance in the other region
https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html
I have data in an AWS RDS, and I would like to pipe it over to an AWS ES instance, preferably updating once an hour, or similar.
On my local machine, with a local mysql database and Elasticsearch database, it was easy to set this up using Logstash.
Is there a "native" AWS way to do the same thing? Or do I need to set up an EC2 server and install Logstash on it myself?
You can achieve the same thing with your local Logstash, simply point your jdbc input to your RDS database and the elasticsearch output to your AWS ES instance. If you need to run this regularly, then yes, you'd need to setup a small instance to run Logstash on it.
A more "native" AWS solution to achieve the same thing would include the use of Amazon Kinesis and AWS Lambda.
Here's a good article explaining how to connect it all together, namely:
how to stream RDS data into a Kinesis Stream
configuring a Lambda function to handle the stream
push the data to your AWS ES instance
Take a look at Amazon DMS. Its usually used for DB migrations, however, it also supports continuous data replication. This might simplify the process and be cost-effective.
You can use AWS Database Migration Service to perform continuous data replication. Continuous data replication has a multitude of use cases including Disaster Recovery instance synchronization, geographic database distribution and Dev/Test environment synchronization. You can use DMS for both homogeneous and heterogeneous data replications for all supported database engines. The source or destination databases can be located in your own premises outside of AWS, running on an Amazon EC2 instance, or it can be an Amazon RDS database. You can replicate data from a single database to one or more target databases or data from multiple source databases can be consolidated and replicated to one or more target databases.
https://aws.amazon.com/dms/