Writing S3 bucket csv to redshift using kinesis - amazon-web-services

I have 3 types of csv in my s3 bucket and want to flow them into respective redshift tables based on csv prefix . I am thinking to use Kinesis to stream data to redshift as file in s3 will be dropped every 5 min. I am all new to aws and not sure how to achieve this.
I have gone through aws documentation but not sure how to achieve this

Related

Sync dynamodb into redshift table with kenisis firehose

I've found a tutorial for how to get item level changes into s3 from dynambo via kinsis firehose here
but how do I get these into a redshift table? If an item is updated, it will create a new record for it and post to s3, so is there a tutorial or guidance on how to take these item level changes and read them into a table?
Kinesis Firehose has multiple destinations that you can choose from. S3 is only one of them, and Redshift is another.
You can use the following configuration to set up Redshift as the destination.

Data Pipeline (DynamoDB to S3) - How to format S3 file?

I have a Data Pipeline that exports my DynamoDB table to an S3 bucket so I can use the S3 file for services like QuickSight, Athena and Forecast.
However, for my S3 file to work with these services, I need the file to be formatted in a csv like so:
date, journal, id
1589529457410, PLoS Genetics, 10.1371/journal.pgen.0030110
1589529457410, PLoS Genetics, 10.1371/journal.pgen.1000047
But instead, my exported file looks like this:
{"date":{"s":"1589529457410"},"journal":{"s":"PLoS Genetics"},"id":{"s":"10.1371/journal.pgen.0030110"}}
{"date":{"s":"1589833552714"},"journal":{"s":"PLoS Genetics"},"id":{"s":"10.1371/journal.pgen.1000047"}}
How can I specify the format for my exported file in S3 so I can operate with services like QuickSight, Athena and Forecast? I'd preferably do the data transformation using Data Pipeline as well.
Athena can read JSON data.
You can also use DynamoDB streams to stream the data to S3. Here is a link to a blog post with best practice and design patterns for streaming data from DynamoDB to S3 to be used with Athena.
You can use DynamoDB streams to trigger an AWS Lambda function, which can transform the data and store it in Amazon S3, Amazon Redshift etc. With AWS Lambda you could also trigger Amazon Forecast to retrain, or pass the data to Amazon Forecast for a prediction.
Alternatively you could use Amazon Data Pipeline to write the data to an S3 bucket as you currently have it. Then use a cloud watch event scheduled to run a lambda function, or an S3 event notification to run a lambda function. The lambda function can transform the file and store it in another S3 bucket for further processing.

Migrating application to AWS Kinesis Firehose and AWS s3 multiple tables

I need to upgrade my application in order to bear streaming data. My application has a different kind of data that is stored in different MySQL tables.
So, I want to create an AWS Kinesis Firehose and AWS Lambda function to receive, transform and load my data to S3 in CSV file.
All the information I have found googling explains very well how to implement this but only storing the data in one unique CSV. I assume that only having one unique CSV, it will be interpreted by Athena as one table.
I have not found any information to create and store multiple CSV files using Kinesis Firehose and AWS Lambda function (which will represent tables in Athena).
Should I create a new Kinesis Firehose instance for each table I have in my MySQL database? or exists some way to store these data in different CSV files?

AWS Glue ETL : transfer data to S3 Bucket

I wish to transfer data in a database like MySQL[RDS] to S3 using AWS Glue ETL.
I am having difficulty trying to do this the documentation is really not good.
I found this link here on stackoverflow:
Could we use AWS Glue just copy a file from one S3 folder to another S3 folder?
SO based on this link, it seems that Glue does not have an S3 bucket as a data Destination, it may have it as a data Source.
SO, i hope i am wrong on this.
BUT if one makes an ETL tool, one of the first basics on AWS is for it to tranfer data to and from an S3 bucket, the major form of storage on AWS.
So hope someone can help on this.
You can add a Glue connection to your RDS instance and then use the Spark ETL script to write the data to S3.
You'll have to first crawl the database table using Glue Crawler. This will create a table in the Data Catalog which can be used in the job to transfer the data to S3. If you do not wish to perform any transformation, you may directly use the UI steps for autogenerated ETL scripts.
I have also written a blog on how to Migrate Relational Databases to Amazon S3 using AWS Glue. Let me know if it addresses your query.
https://ujjwalbhardwaj.me/post/migrate-relational-databases-to-amazon-s3-using-aws-glue
Have you tried https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-template-copyrdstos3.html?
You can use AWS Data Pipeline - it has standard templates for full as well incrementation copy to s3 from RDS.

Convert csv file into DynamoDB json format

I have +7 million records stored in CSV file hosted at AWS S3 bucket and I want to load them into DynamoDB table. I've tried data AWS pipeline service but the job always failed, because this service doesn't support importing CSV format.
So I should first convert CSV data into format that can be understood by DynamoDB. Is there any way to make this conversion?
AWS Datapipeline service supports CSV Import to dynamo db. You can create a pipeline from the aws console for datapipeline and choose "Import DynamoDB backup data from S3." to import CSV stored in S3 to Dynamodb.
See also
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBPipeline.html#DataPipelineExportImport.Importing