I would like to synchronize an S3 bucket with a single directory on multiple Windows EC2 instances. When a file is uploaded or deleted from the bucket, I would like it to be immediately pushed or removed respectively from all of the instances. New instances will be added/removed frequently (multiple times per week). Files will be uploaded/deleted frequently as well. The files sizes could be up to 2gb in size. What AWS services or features can solve this?
Based on what you've described, I'd propose the following solution to this problem.
You need to create an SNS topic for S3 change notifications. Then you need a script that's going to subscribe to this topic from your machines. This script will update files on your machines based on changes coming from S3. It should support basic CRUD operations.
Run this script and then sync contents of your S3 to your machine when it starts using aws-cli mentioned above.
Yes, i have used the aws cli s3 "sync" command to keep a local server's content updated with S3 changes. It allows a local target directory's files to be synchronized with a bucket or prefix.
http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
Edit : This following answer is to sync EC2 with S3 Bucket, Source : EC2 & Destination : Bucket.
If it were for only one instance, then only aws cli sync(with --delete option) would have been worked for both: putting files to S3 bucket and to delete.
But the case here is for Multiple Instances, so if we use aws s3 sync with --delete option, there would be a problem.
To explain it simply, consider Instance I1 with files a.jpg & b.jpg to be synced to Bucket.
Now a CRON job has synced the files with the S3 bucket.
Now we have Instance I2 which has files c.jpg & d.jpg.
So when the CRON job of this Instance runs, it puts the files c.jpg & d.jpg and also deletes the files a.jpg & b.jpg, because those files doesn't exist in Instance I2.
So to rectify the problem we have two approaches :
Sync all files across all Instances(Costly and removes the purpose of S3 altogether).
Sync files without the --delete option, and implement the deletion separately(using aws s3 rm).
Related
I'm running a cron job in a EC2 instance that backups a database dump and a folder (with files and subfolders) in a S3 bucket.
I only want to backup new and modified files in order to save costs. Is this possible?
I'm currently using aws cp, maybe there is an argument or another command?
thanks
Use aws s3 sync instead of aws s3 cp and it will do this automatically for you.
Im trying to have a replica of my s3 bucket in a local folder. it should be updated when a change occurs on the bucket.
You can use the aws cli s3 sync command to copy ('synchronize') files from an Amazon S3 bucket to a local drive.
To have it update frequently, you could schedule it as a Windows Scheduled Tasks. Please note that it will be making frequent calls to AWS, which will incur API charges ($0.005 per 1000 requests).
Alternatively, you could use utilities that 'mount' an Amazon S3 bucket as a drive (Tntdrive, Cloudberry, Mountain Duck, etc). I'm not sure how they detect changes -- they possibly create a 'virtual drive' where the data is not actually downloaded until it is accessed.
You can use rclone and Winfsp to mount S3 as a drive.
Though this might not be a 'mount' in traditional terms.
You will need to setup a task scheduler for a continuous sync.
Example : https://blog.spikeseed.cloud/mount-s3-as-a-disk/
I have a bucket with folder s3://mybucket/abc/thisFolder which contains thousands of files inside.
I can use aws s3 rm s3://mybucket/abc/thisFolder --recursive to delete it and all files inside, and it does it fine one by one.
However, there's also a delete-folder command, but to me the official doc is not very clear. Its example says aws workdocs delete-folder --folder-id 26fa8aa4ba2071447c194f7b150b07149dbdb9e1c8a301872dcd93a4735ce65d
I would like to know what is workdocs in example above, and how do I obtain the long --folder-id string for my folder s3://mybucket/abc/thisFolder?
Thank you.
Amazon WorkDocs is a Dropbox-like service.
If you wish to delete objects in Amazon S3, then you should only use AWS CLI commands that start with aws s3 or aws s3api.
Another way to delete folders in Amazon S3 is to configure Amazon S3 Object lifecycle management with a rule to delete objects with a given prefix. They might take a while to delete (~24 hours), but it will happen automatically rather than one-by-one.
I have enabled AWS S3 replication in an account and I want to replicate the same S3 data to another account and it all works fine. But I don't want to use S3 versioning because of its additional cost.
So is there any other way to accommodate this scenario?
The automated Same Region Replication(SRR) and Cross Region Replication(CRR) requires versioning to be activated due to the way that data is replicated between S3 buckets. For example, a new version of an object might be uploaded while a bucket is still being replicated, which can lead to problems without having separate versions.
If you do not wish to retain other versions, you can configure Amazon S3 Lifecycle Rules to expire (delete) older versions.
An alternative method would be to run the AWS CLI aws s3 sync command at regular intervals to copy the data between buckets. This command would need to be run on an Amazon EC2 instance or even your own computer. It could be triggered by a cron schedule (Linux) or a Schedule Task (Windows).
I am trying to set up cross region replication so that my original file will be replicated to two different regions. Right now, I can only get it to replicate to one other region.
For example, my files are on US Standard. When a file is uploaded it is replicated from US Standard to US West 2. I would also like for that file to be replicated to US West 1.
Is there a way to do this?
It appears the Cross-Region Replication in Amazon S3 cannot be chained. Therefore, it cannot be used to replicate from Bucket A to Bucket B to Bucket C.
An alternative would be to use the AWS Command-Line Interface (CLI) to synchronise between buckets, eg:
aws s3 sync s3://bucket1 s3://bucket2
aws s3 sync s3://bucket1 s3://bucket3
The sync command only copies new and changed files. Data is transferred directly between the Amazon S3 buckets, even if they are in different regions -- no data is downloaded/uploaded to your own computer.
So, put these commands in a cron job or a Scheduled Task to run once an hour and the buckets will nicely replicate!
See: AWS CLI S3 sync command documentation