How to load files located On-Prem to AWS using AWS Glue - amazon-web-services

Can I directly load files located in an On-Prem location to RDS using AWS GLUE?
Also if I have to park the files in an S3 before loading, what options do I have apart from using CLI?

Did you check out AWS Data Migration Service? In that you do not need AWS Glue.
If you prefer not to use AWS DMS, and go with S3, you can use S3 clients such as CloudBerry to move files from on-prem to S3, and use RDS commands, "Load data from S3...." to insert data into the RDS Tables (Only for MySQL) in AWS Glue script.
Please refer here for RDS COMMAND Load data from S3: S3 to RDS MySQL

you can load s3 data to RDS using aws glue crawler and job.
But to load on premise data to s3 if your data is not that huge you can to use s3 file upload functionality via script or program, but if your data is too large then go for AWS Data Migration Service.

Related

Can I use s3fs to perform "free data transfer" between AWS EC2 and S3?

I am looking to deploy a Python Flask app on an AWS EC2 (Ubuntu 20.04) instance. The app fetches data from an S3 bucket (in the same region as the EC2 instance) and performs some data processing.
I prefer using s3fs to achieve the connection to my S3 bucket. However, I am unsure if this will allow me to leverage the 'free data transfer' from S3 to EC2 in the same region - or if I must use boto directly to facilitate this transfer?
My app works when deployed with s3fs, but I would have expected the data transfer to be much faster - so I am wondering that perhaps AWS EC2 is not able to "correctly" fetch data using s3fs from S3.
All communication between Amazon EC2 and Amazon S3 in the same region will not incur a Data Transfer fee. It does not matter which library you are using.
In fact, communication between any AWS services in the same region will not incur Data Transfer fees.

Is there a way to export data from S3 to Amazon Aurora serverless with lambda?

So I was searching for a solution that could let me export S3 data into Aurora Serverless. I know that the LOAD DATA request is only available for the Aurora cluster not the serverless one.
I've found some documentation about the data injection from S3 to RDS MySQL but I don't know if this still applies to Amazon Aurora MySQL.
If you're ok with temporarily putting the data on an ec2 instance, you can do it in two steps:
aws s3 cp s3://path/to/mydatafile /local/path/to/mydatafile
mysql --defaults-file=/path/to/.my.cnf -e "load data local infile '/local/path/to/mydatafile' into table sampletable"
References
StackOverflow discussion on loading data
MySQL "load data" reference
Copying from s3
Using MySQL options files

Python script to load data from AWS S3 to Redshift

Has anybody worked on creating a python script to load data from s3 to redshift tables for multiple files. How can we acheive it in AWS CLI. Your learnings and inputs on the same is appreciated.
The COPY command is the best way to load data from Amazon S3 to Amazon Redshift. It can load multiple files in parallel into the one table.
Use any Python library (eg PostgreSQL + Python | Psycopg) to connect to Amazon Redshift, then issue the COPY command.
The AWS Command-Line Interface (CLI) does not have the ability to run the COPY command on Redshift because it needs to be issued to the database, while the AWS CLI issues commands to AWS. (The AWS CLI can be used to launch/terminate a Redshift cluster, but not to connect to the cluster itself.)

best practise for logging on Elastic Map Reduce - AWS

I am planning to use Amazon EMR for spark streaming application. Amazon provides a nice interface to show stderr & controller log. But for streaming application I am not sure how manage the logs.
Amazon logs the data to /var/log/hadoop/steps/<step-id> and similar places for spark. http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-manage-view-web-log-files.html
I was wondering on how do we rotate logs and still be accessible via the aws emr web interface. We can easily change the log rotation policy by configuring the hadoop-log4j, but that way I cannot access it via the web interface. Also EMR should manage the log s3 upload
AWS EMR also stores the logs in S3.
Navigate to your cluster console for the running cluster, and in the left middle column, you'll see the path to the s3 bucket.
Be careful not to reuse the same s3 bucket path for future clusters, otherwise you could overwrite your log data.

Backing up aws RDS oracle database to s3 bucket

I have my oracle database running on amazon web services(aws) in RDS instance.
The total size of the database would be less than 100gb. I am planning to take the backup of entire database, particularly I want to backup the database to S3.
Can any one suggest me a solution to achieve this?
For backups within AWS you can use the snapshot function of RDS. Snapshots can be used to restore your database to a point in time.
See http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.BackingUpAndRestoringAmazonRDSInstances.html
However, you can not download these snapshots or access them physically.
If you want backups on S3 you have to dump the databases and upload the dumps to S3
See
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Oracle.Procedural.Importing.html
Download MySql Backup/Snapshot from Amazon RDS