I'm currently working with an aws kinesis data stream, and I would like to join the the stream to a mysql table in order to enrich the data. Is it possible to create a Mysql connecion in apache zeppelin notebook, and then join the data to the kinesis data stream?
Related
It seems that AWS Glue "Add Connection" can only add connections specific to only one database. I have to connect all databases from MS SQL server. Is it possible to cover multiple databases in one Aws Glue "Add Connection" or we need new connection for every new database.
The JDBC connection string is limited to one database at a time.
From glue's documentation:
For JDBC to connect to the data store, a db_name in the data store is required. The db_name is used to establish a network connection with the supplied username and password. When connected, AWS Glue can access other databases in the data store to run a crawler or run an ETL job.
https://docs.aws.amazon.com/glue/latest/dg/console-connections.html?icmpid=docs_glue_console
The database name is part of jdbc url. As you can have only one url in a glue connection you can only point to one database. But you can still use all the schema under the database.
What is the best practice for moving data from a Kafka cluster to a Redshift table?
We have continuous data arriving on Kafka and I want to write it to tables in Redshift (it doesn't have to be in real time).
Should I use Lambda function?
Should I write a Redshift connector (consumer) that will run on a dedicated EC2 instance? (downside is that I need to handle redundancy)
Is there some AWS pipeline service for that?
Kafka Connect is commonly used for streaming data from Kafka to (and from) data stores. It does useful things like automagically managing scaleout, fail over, schemas, serialisation, and so on.
This blog shows how to use the open-source JDBC Kafka Connect connector to stream to Redshift. There is also a community Redshift connector, but I've not tried this.
This blog shows another approach, not using Kafka Connect.
Disclaimer: I work for Confluent, who created the JDBC connector.
The tool below is a batch import method of copying data from SQL Server RDS into Redshift.
AWS Schema Conversion Tool Exports from SQL Server to Amazon Redshift
Is there a more streamlined method, conducting every second way of streaming data from MS SQL Server into Redshift with Kinesis Firehose. I know we can move AWS Aurora SQL directly into Redshift with Kinesis.
If your goal is to move data from Microsoft SQL Server into Amazon Redshift, then you could consider using AWS Database Migration Service. It can copy data as a one-off job but can also migrate on a continuing basis.
See:
Using a Microsoft SQL Server Database as a Source for AWS DMS - AWS Database Migration Service
Using an Amazon Redshift Database as a Target for AWS Database Migration Service - AWS Database Migration Service
I want to migrate our database and data warehouse into AWS.
Our on-prem database is Oracle and we use Oracle Data Integrator for data warehousing in IBM AIX.
My first thought was to migrate our database with AWS DMS (Data Migration Services) into a staging point (S3) and then using Lambda (for creating the trigger when data is updated, deleted or inserted) and Kinesis Firehose (For streaming and do the ETL) to send the data into Redshift.
The data in Redshift must be the replica of our on-prem data warehouse (containing facts and dimensions, aggregation and multiple joins) and I want whenever any changes happened in the on-prem database, it automatically updates the AWS S3 and Redshift so I can have near real-time data in my Redshift.
I was wondering if my architecture is correct and/or is there a better way to do it?
Thank you
I am confused on how we can send MySQL data from MySQL database deployed in an EC2 instance to an Amazon RedShift cluster.
What are the ways that can be used for doing this task?
Possible solutions:
Easiest solution should be "AWS Data Pipeline".
write output of your SQL Query into a CSV file --> zip it ( if huge data ) --> upload to S3 --> Use Redshift copy command to upload all these records in bulk into redshift.