Stream Data from SQL Server into Redshift with Kinesis Firehose - amazon-web-services

The tool below is a batch import method of copying data from SQL Server RDS into Redshift.
AWS Schema Conversion Tool Exports from SQL Server to Amazon Redshift
Is there a more streamlined method, conducting every second way of streaming data from MS SQL Server into Redshift with Kinesis Firehose. I know we can move AWS Aurora SQL directly into Redshift with Kinesis.

If your goal is to move data from Microsoft SQL Server into Amazon Redshift, then you could consider using AWS Database Migration Service. It can copy data as a one-off job but can also migrate on a continuing basis.
See:
Using a Microsoft SQL Server Database as a Source for AWS DMS - AWS Database Migration Service
Using an Amazon Redshift Database as a Target for AWS Database Migration Service - AWS Database Migration Service

Related

Can we use AWS glue for analysing the RDS database and store the analysed data into rds mysql table using ETL

I am new in AWS. I want to use AWS glue for ETL process.
Could we use AWS glue for analyzing the RDS database and store the analyzed data into rds mysql table using ETL job
Thanks
Yes, its possible. We have used S3 to store our raw data, from where we read the data in AWS Glue, and perform UPSERTs to RDS Aurora as part of our ETL process. You can either use AWS Glue trigger or a Lambda S3 event triggers for calling the glue job.
We have used pymysql / mysql.connector in AWS Glue since we have to do UPSERTs. Bulk load data directly from S3 is also supported for RDS Mysql (Aurora). Let me know if you need help with code sample

Moving data from S3 -> RDS using AWS Glue

Does AWS Glue provide ability to move data from S3 bucket to RDS database?
I'm trying to setup serverless app that picks up dynamic data uploaded to S3 and migrates it to RDS.
Glue provides Crawlers service that determines schema.
Glue also provides ETL Jobs, but this seems to be where target source is only another S3 bucket.
Any ideas?
Yes, Glue can send to an RDS datastore. If you are using the job wizard it will give you a target option of "JDBC". If you select JDBC you can setup a connection to your rds instance.
JDBC connection window
RDS option window
Please check below link you will find the solution
Load the data from S3 bucket to RDS Oracle Instance
using AWS Glue
https://youtu.be/rBFfYpHP1PM

Using MySQL Aurora as source in Athena

I would like to use a few Aurora(MySQL) tables as source when creating external tables on AWS Athena. Because, those tables are mutable and gets updated often. I see Hive and Presto support this by using org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler. Is there an equivalent of any such feature in AWS Athena?
Amazon Athena is dedicated to running interactive ad hoc SQL queries against data on Amazon S3, the mentioned feature isn't supported yet.
https://docs.aws.amazon.com/athena/latest/ug/supported-format.html
You have two solutions in this situation:
1) You can use Glue or EMR service.
https://docs.aws.amazon.com/glue/latest/dg/console-connections.html
2) You can export the data from Aurora to S3 and then you can use Athena.
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.SaveIntoS3.html

Migrate database and data warehouse into AWS

I want to migrate our database and data warehouse into AWS.
Our on-prem database is Oracle and we use Oracle Data Integrator for data warehousing in IBM AIX.
My first thought was to migrate our database with AWS DMS (Data Migration Services) into a staging point (S3) and then using Lambda (for creating the trigger when data is updated, deleted or inserted) and Kinesis Firehose (For streaming and do the ETL) to send the data into Redshift.
The data in Redshift must be the replica of our on-prem data warehouse (containing facts and dimensions, aggregation and multiple joins) and I want whenever any changes happened in the on-prem database, it automatically updates the AWS S3 and Redshift so I can have near real-time data in my Redshift.
I was wondering if my architecture is correct and/or is there a better way to do it?
Thank you

Sync data from Amazon Aurora to Redshift

I am trying to setup a sync between AWS Aurora and Redshift. What is the best way to achieve this sync?
Possible ways to sync can be: -
Query table to find changes in a table(since I am only doing inserts, updates don't matter), export these changes to a flat file in S3 bucket and use Redshift copy command to insert into Redshift.
Use python publisher and Boto3 to publish changes into a Kinesis stream and then consume this stream in Firehose from where I can copy directly into Redshift.
Use Kinesis Agent to detect changes in binlog (Is it possible to detect changes int binlog using Kinesis Agent) and publish it to Firehose and from there copy into Firehose.
I haven't explored AWS Datapipeline yet.
As pointed out by #Mark B, the AWS Database Migration Service can migrate data between databases. This can be done as a one-off exercise, or it can run continuously, keeping two databases in sync.
The documentation shows that Amazon Aurora can be a source and Amazon Redshift can be a target.
AWS has just announced this new feature: Amazon Aurora zero-ETL integration with Amazon Redshift
This natively provides near real-time (second) synchronization from Aurora to Redshift.
You can also use federated queries: https://docs.aws.amazon.com/redshift/latest/dg/federated-overview.html