I have a bucket with folder s3://mybucket/abc/thisFolder which contains thousands of files inside.
I can use aws s3 rm s3://mybucket/abc/thisFolder --recursive to delete it and all files inside, and it does it fine one by one.
However, there's also a delete-folder command, but to me the official doc is not very clear. Its example says aws workdocs delete-folder --folder-id 26fa8aa4ba2071447c194f7b150b07149dbdb9e1c8a301872dcd93a4735ce65d
I would like to know what is workdocs in example above, and how do I obtain the long --folder-id string for my folder s3://mybucket/abc/thisFolder?
Thank you.
Amazon WorkDocs is a Dropbox-like service.
If you wish to delete objects in Amazon S3, then you should only use AWS CLI commands that start with aws s3 or aws s3api.
Another way to delete folders in Amazon S3 is to configure Amazon S3 Object lifecycle management with a rule to delete objects with a given prefix. They might take a while to delete (~24 hours), but it will happen automatically rather than one-by-one.
Related
I'm running a cron job in a EC2 instance that backups a database dump and a folder (with files and subfolders) in a S3 bucket.
I only want to backup new and modified files in order to save costs. Is this possible?
I'm currently using aws cp, maybe there is an argument or another command?
thanks
Use aws s3 sync instead of aws s3 cp and it will do this automatically for you.
Currently, my S3 bucket contains files. I want to create a folder for each file on S3.
Current -> s3://<bucket>/test.txt
Expectation -> s3://<bucket>/test/test.txt
How can I achieve this using the EC2 instance?
S3 doesn't have "folders" really, object names may contain / characters in them and that in a way emulates folders. Simply name your objects test/<filename> to achieve that. See the S3 docs for more.
As for doing it from EC2, it is no different from doing it from anywhere else (except, maybe, in EC2 you may be able to rely on an IAM profile instead of using ad-hoc credentials). If you've tried it and failed, maybe post a new question with more details.
If you have Linux you can try something like:
aws s3 ls s3://bucket/ | while read date time size name; do aws s3 mv s3://bucket/${name} s3://bucket/`echo ${name%.*}`/${name}; done
it does not depend upon EC2 instance. You can use aws cli from EC2 instance or some other source with putting desired path, for your case s3:///test/test.txt. you can even change the name of the file you are copying into s3 bucket even its extension if you want.
We have the following workflow at my work:
Download the data from AWS s3 bucket to the workspace:
aws s3 cp --only-show-errors s3://bucket1
Unzip the data
unzip -q "/workspace/folder1/data.zip" -d "/workspace/folder2"
Run a java command
java -Xmx1024m -jar param1 etc...
Sync the archive back to the s3 target bucket
aws s3 sync --include #{archive.location} s3://bucket
As you can see that the downloading data from s3 bucket, unzipping, running some java operation on the data and copying back to s3 costs a lot of time and resources.
Hence, we are planning to unzip directly in the s3 target bucket and run java operation there. Would it be possible to run the java operation directly in s3 bucket? If yes, could you please provide some insights?
Its not possible to run the java 'in S3', but what you can do is move your Java code to an AWS Lambda function, and all the work can be done 'in the cloud', i.e., no need to download to a local machine, process and re-upload.
Without knowing the details of you requirements, I would consider setting up an S3 notification request that gets invoked each time a new file gets PUT into a particular location, and AWS Lambda function that gets invoked with the details of that new file, and then have Lambda output the results to a different bucket/location with the results.
I have done similar things (though not with java) and have found it rock solid way of processing files.
No.
You cannot run code on S3.
S3 is an object store, which don't provide any executing environment. To do any modifications to the files, you need to download it, modify and upload back to S3.
If you need to do operations on files, you can look into using AWS Elastic File System which you can mount to your EC2 instance and do the operations as required.
We have options to :
1. Copy file/object to another S3 location or local path (cp)
2. List S3 objects (ls)
3. Create bucket (mb) and move objects to bucket (mv)
4. Remove a bucket (rb) and remove an object (rm)
5. Sync objects and S3 prefixes
and many more.
But before using the commands, we need to check if S3 service is available in first place. How to do it?
Is there a command like :
aws S3 -isavailable
and we get response like
0 - S3 is available, I can go ahead upload object/create bucket etc.
1 - S3 is not availble, you can't upload object etc. ?
You should assume that Amazon S3 is available. If there is a problem with S3, you will receive an error when making a call with the Amazon CLI.
If you are particularly concerned, then add a simple CLI command first, eg aws s3 ls and throw away the results. But that's really the same concept. Or, you could use the --dry-run option available on many commands that simply indicates whether you would have had sufficient permissions to make the request, but doesn't actually run the request.
It is more likely that you will have an error in your configuration (eg wrong region, credentials not valid) than S3 being down.
I would like to synchronize an S3 bucket with a single directory on multiple Windows EC2 instances. When a file is uploaded or deleted from the bucket, I would like it to be immediately pushed or removed respectively from all of the instances. New instances will be added/removed frequently (multiple times per week). Files will be uploaded/deleted frequently as well. The files sizes could be up to 2gb in size. What AWS services or features can solve this?
Based on what you've described, I'd propose the following solution to this problem.
You need to create an SNS topic for S3 change notifications. Then you need a script that's going to subscribe to this topic from your machines. This script will update files on your machines based on changes coming from S3. It should support basic CRUD operations.
Run this script and then sync contents of your S3 to your machine when it starts using aws-cli mentioned above.
Yes, i have used the aws cli s3 "sync" command to keep a local server's content updated with S3 changes. It allows a local target directory's files to be synchronized with a bucket or prefix.
http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
Edit : This following answer is to sync EC2 with S3 Bucket, Source : EC2 & Destination : Bucket.
If it were for only one instance, then only aws cli sync(with --delete option) would have been worked for both: putting files to S3 bucket and to delete.
But the case here is for Multiple Instances, so if we use aws s3 sync with --delete option, there would be a problem.
To explain it simply, consider Instance I1 with files a.jpg & b.jpg to be synced to Bucket.
Now a CRON job has synced the files with the S3 bucket.
Now we have Instance I2 which has files c.jpg & d.jpg.
So when the CRON job of this Instance runs, it puts the files c.jpg & d.jpg and also deletes the files a.jpg & b.jpg, because those files doesn't exist in Instance I2.
So to rectify the problem we have two approaches :
Sync all files across all Instances(Costly and removes the purpose of S3 altogether).
Sync files without the --delete option, and implement the deletion separately(using aws s3 rm).