Monitoring a S3 bucket and downloading any new files continuously - amazon-web-services

Is there any way I can monitor a S3 bucket for any new files added to it using boto3? Once a new file is added to the S3 bucket, it needs to be downloaded.
My Python code needs to run on an external VMC Server, which is not hosted on an AWS EC2 instance. Whenever a vendor will push a new file to our public S3 bucket, I need to download those files to this VMC Server for ingestion in our on-prem databases/servers. I can't access the VMC Server from AWS either, and neither is there any webhook available.
I have written the code for downloading the files, however, how can I monitor a S3 bucket for any new files?

Take a look at S3 Event Notifications: https://docs.aws.amazon.com/AmazonS3/latest/userguide/NotificationHowTo.html

Related

Downloading files from a S3 bucket to another server

I have a requirement that whenever files are placed in a S3 bucket, they need to be moved to an external VMC server. I have written a boto3 script that can download these files.
But how do I automate this process, so that whenever new files land on S3 bucket, script runs on external VMC server, and files are downloaded on it.
Is there any way I can do that?

Accessing Amazon S3 via FTP?

I have did a number of searches and can't seem to understand if this is doable at all.
I have a data logger that has FTP-push function. The FTP-push function have the following settings:
FTP server
Port
Upload directory
User name
Password
In general, I understand that a Filezilla client (I have a Pro edition) is able to drop files into my AWS S3 bucket and I had done this successfully in my local PC.
Is it possible to remove the Filezilla client requirement and input my S3 information directly into my data logger? Something like the below diagram:
Data logger ----FTP----> S3 bucket
If not, what will be the most sensible method to have my data logger JSON files drop into AWS S3 via FTP?
Frankly, you'd be better off with:
Logging to local files
Using a schedule to copy the log files to Amazon S3 using the aws s3 sync command
The schedule could be triggered by cron (Linux) or a Scheduled Task (Windows).
Amazon did add support recently to AWS Transfer for FTP support. This will provide an integration with Amazon S3 via FTP without setting up any additional infrastructure, however you should review the pricing at the moment.
As an alternative you could create an intermediary server that can sync between itself and AWS S3 using the cli aws s3 sync.

How do I sync an AWS S3 bucket with files on a remote non-AWS server?

I want to sync an AWS S3 bucket up with files on a remote non-AWS server. I have proper access to both the remote server and the EC2 instance that has access to the S3 bucket. What is the best way to do this?
I looked at the docs for the aws s3 sync command, and it looks like you can only sync up an S3 bucket with files locally on the server that has access to the S3 bucket.
The problem is, I have files on a remote server that I want to sync up with an S3 bucket, but that server is not an AWS EC2 instance.
I am able to use the rsync command to get the files from the remote server onto the AWS server that has access to the S3 bucket, but if I do the rsync command and then the aws s3 sync command, then it becomes a two-step process to move the files, takes about twice as long, and because there are a lot of files, I'd also have to increase the size of the EC2 instance volume to do all the files at once. All not ideal.
As such, is there a way to sync up an S3 bucket with a remote server that is not an AWS server and that does not have access to the S3 bucket by using an EC2 instance that does have access to the S3 bucket as an intermediary? Thank you.
The simplest approach would be to use sshfs. This link has detailed instructions, but the basic process is as follows:
Create a local directory where you'll mount the remote system, such as /tmp/syncmount
Run sshfs USER#REMOTE:DIRECTORY /tmp/syncmount
Run s3sync /tmp/syncmount s3://YOUR_BUCKET
I'm assuming that if you have rsync access you'll have general SSH access.

Simplest way to fetch the file from FTP server (on-prem) & put into S3 bucket

As per my project requirement, I want to fetch some files from on-prem FTP server & put them into a S3 bucket. Files are of size 1-2 GB. Once the file will be put into the FTP server folder, I want that file to be uploaded to S3 bucket.
Please suggest the easiest way to achieve this?
Note- Mostly the files will be put into FTP server only once in a day, hence i dont want continuously scan the FTP server. once the files will be uploaded to S3 from FTP server, i want to terminate any resources (like EC2) created in AWS.
These are my ideas:
I think you could create an agent on your FTP server that will upload the files every N seconds/minutes/hours/Etc using the AWS CLI. This way you're avoiding external access to your FTP server.
Another approach is a Lambda function for pulling process, but like you said the FTP server doesn't allow external access.
Create a VPN between your on-prem and the cloud infra, create a Cloudwatch event and through a Lambda execute the pulling process.
Here you can configure a timeout:
Create a VPN between your on-prem and the cloud infra, from your FTP server upload the files using AWS CLI (pay attention to sync option). Take a look at this link: https://aws.amazon.com/answers/networking/accessing-vpc-endpoints-from-remote-networks/
With Jenkins create a task to execute a process that will upload the files.
You can use Storage gateway, visit its site here: https://aws.amazon.com/es/storagegateway/
Here is how we solved it.
Enable S3 acceleration on your S3 bucket. This is very much needed, since you are pushing large file.
If you have access to the server install aws cli and perform a sync on the folder to s3 bucket. AWS CLI will automatically sync your folder to bucket. This way if you change any of your existing files, it will keep in sync with S3 bucket. This is ideal and simplest way if you have access to the server and able to install aws cli.
https://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration-examples.html#transfer-acceleration-examples-aws-cli
aws s3api put-bucket-accelerate-configuration --bucket bucketname --accelerate-configuration Status=Enabled
If you want to enable for specific or default profile,
aws configure set default.s3.use_accelerate_endpoint true
If you don't have access to ftp server in your premisis, you need an external server to perform this process. In this case you need to perform a poll or share file system, copy the file locally and move it to s3 bucket. There will be lot of failure points with this process.
Hope it helps.

AWS Static web hosting - tedious to update site

I'm using AWS to host a static website. Unfortunately, it's very tedious to upload the directory to S3. Is there any way to streamline the process?
Have you considered using AWSCLI - AWS Command Line Interface to interact with AWS Services & resources.
Once you install and configure the AWSCLI; to update the site all that you need to do is
aws s3 sync s3://my-website-bucket /local/dev/site
This way you can continue developing the static site locally and a simple aws s3 sync command line call would automatically look at the files which have changed since the last sync and automatically uploads to S3 without any mess.
To make the newly created object public (if not done using Bucket Policy)
aws s3 sync s3://my-website-bucket /local/dev/site --acl public-read
The best part is, the multipart upload is built in. Additionally you sync back from S3 to local (the reverse)