Output AWS CLI "sync" results to a txt file - amazon-web-services

I'm new to AWS and specifically to the AWS CLI tool, but so far I seem to be going OK.
I'm using the following commands to connect to AWS S3 and synchronise a local directory to my S3 bucket:
set AWS_ACCESS_KEY_ID=AKIAIMYACCESSKEY
set AWS_SECRET_ACCESS_KEY=NLnfMySecretAccessCode
set AWS_DEFAULT_REGION=ap-southeast-2
aws s3 sync C:\somefolder\Data\Dist\ s3://my.bucket/somefolder/Dist/ --delete
This is uploading files OK and displaying the progress and result for each file.
Once the initial upload is done, I'm assuming that all new syncs will just upload new and modified files and folders. Using the --delete will remove anything in the bucket that no longer exists on the local server.
I'd like to be able to output the results of each upload (or download in the case of other servers which will be getting a copy of what is being uploaded) to a .txt file on the local computer so that I can use blat.exe to email the contents to someone who will be monitoring the sync.
All of this will be put into a batch file that will be scheduled to run nightly.
Can the output to .txt be done? If so, how?

I haven't tested this myself, but I found some resources that indicate you can redirect output from command-line driven applications in Windows command prompt just like you would in linux.
aws s3 sync C:\somefolder\Data\Dist\ s3://my.bucket/somefolder/Dist/ --delete > output.txt
The resources I found are:
https://stackoverflow.com/a/16713357/4471711
https://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/redirection.mspx?mfr=true
Once the initial upload is done, I'm assuming that all new syncs will just upload new and modified files and folders. Using the --delete will remove anything in the bucket that no longer exists on the local server.
That is correct, sync will upload either new or modified files as compared to the destination (whether it is an S3 bucket or your local machine).
--delete will remove anything in the destination (not necessarily an S3 bucket) that is not in the source. It should be used carefully so as to avoid a situation where you've downloaded, modified and sync one file and because your local machine doesn't have ALL of the files, use of the --delete flag will then delete all other files at destination.

Related

Upload only newly modified files to S3 bucket using Golang aws-sdk

I'm trying to implement a backup mechanism to S3 bucket in my code.
Each time a condition is met I need to upload an entire directory contents to an S3 bucket.
I am using this code example:
https://github.com/aws/aws-sdk-go/tree/c20265cfc5e05297cb245e5c7db54eed1468beb8/example/service/s3/sync
Which creates an iterator of the directory content's and then use s3manager.Upload.UploadWithIterator to upload them.
Everything works, however I noticed it uploads all files and overwrites existing files on the bucket even if they weren't modified since last backup, I only want to upload the delta between each backup.
I know aws cli has the command aws s3 sync <dir> <bucket> which does exactly what I need, however I couldn't find anything equivalent on aws-sdk documentation.
Appreciate the help, thank you!
There is no such feature in aws-sdk. You could instrument it yourself for each file to check the hash of both objects before upload. Or use a community solution https://www.npmjs.com/package/s3-sync-client

How to copy only updated files/folders from aws s3 bucket to local machine?

I have a requirement of copying certain files from an S3 bucket to local machine. Below are the important points to note on my requirement:
The files are kept in S3 bucket based on the date folder.
The files are in csv.gz extension and I need to change it to csv and copy it to my local machine.
It keeps on updating every minute and I need to copy only the new files and process it. The processed files needs not to be copied again.
I have tried using sync folder but after processing of the file, the file name is renamed and again the csv.gz file is synced with the local folder.
I am planning to use some scheduled task to con.
Amazon S3 is a storage service. It cannot 'process' files for you.
If you wish to change the contents of a file (eg converting from .csv.gz to .csv), you would need to do this yourself on your local computer.
The AWS Command-Line Interface (CLI) aws s3 sync command makes it easy to copy files that have been changed/added since the previous sync. However, if you are changing the files locally (unzipping), then you will likely need to write your own program to download from Amazon S3.
There are AWS SDKs available for popular programming languages. You can also do a web search to find sample code for using Amazon S3.

How does the aws s3 cli figure out which files to sync?

When using the aws s3 cli to sync the files of a web app over to my s3 bucket I noticed that the sync command always uploads every file, even though they didn't actually change. Whats changed though is the timestamp of the files. So I was wondering how the sync command figures out which files it needs to upload?
Does sync only compare file name and timestamp?
Timestamp and Size.
If you want sync to consider size only:
--size-only (boolean) Makes the size of each key the only criteria used to decide whether to sync from source to destination.

Running java operation directly on AWS S3 target bucket

We have the following workflow at my work:
Download the data from AWS s3 bucket to the workspace:
aws s3 cp --only-show-errors s3://bucket1
Unzip the data
unzip -q "/workspace/folder1/data.zip" -d "/workspace/folder2"
Run a java command
java -Xmx1024m -jar param1 etc...
Sync the archive back to the s3 target bucket
aws s3 sync --include #{archive.location} s3://bucket
As you can see that the downloading data from s3 bucket, unzipping, running some java operation on the data and copying back to s3 costs a lot of time and resources.
Hence, we are planning to unzip directly in the s3 target bucket and run java operation there. Would it be possible to run the java operation directly in s3 bucket? If yes, could you please provide some insights?
Its not possible to run the java 'in S3', but what you can do is move your Java code to an AWS Lambda function, and all the work can be done 'in the cloud', i.e., no need to download to a local machine, process and re-upload.
Without knowing the details of you requirements, I would consider setting up an S3 notification request that gets invoked each time a new file gets PUT into a particular location, and AWS Lambda function that gets invoked with the details of that new file, and then have Lambda output the results to a different bucket/location with the results.
I have done similar things (though not with java) and have found it rock solid way of processing files.
No.
You cannot run code on S3.
S3 is an object store, which don't provide any executing environment. To do any modifications to the files, you need to download it, modify and upload back to S3.
If you need to do operations on files, you can look into using AWS Elastic File System which you can mount to your EC2 instance and do the operations as required.

Why can I not run dynamic content from Amazon S3?

I know that Amazon S3 is a service for storing static files. But what I don't understand is, if I store some PHP files on a S3 bucket, why isn't it possible to have those files executed from a EC2 instance?
Amazon S3 is a data storage service. When a file is requested from S3, it is sent to the requester, regardless of file format. S3 does not process the file in any way, nor does it pass content to Amazon EC2 for execution.
If you want a PHP file executed by a PHP engine, you will need to run a web server on an Amazon EC2 instance.
Run directly from S3 this will never work as objects in s3 aren't presented in a way whilst stored in s3 that your local system can really use.
However good news you can pull the php down from S3 to your local system and execute it!
I use this method myself with an instance created by lambda to do some file processing. Lambda creates the instance, the bash script in the instance UserData will do an s3 copy (see below) to copy the php file down and the data file down that PHP will process and then php is called against my file.
To download a file from s3 in the cli you:
//save as file.php in the current directory
aws s3 cp s3://my-s3-bucket-name/my/s3/file.php .
//or
//save as a different filename
aws s3 cp s3://my-s3-bucket-name/my/s3/file.php my-file.php
//or
//save it in a different folder
aws s3 cp s3://my-s3-bucket-name/my/s3/file.php some/directory/path/file.php
You would then pass this file into PHP for execution like any other file.