Trying to modify ACL on multiple objects and folders on Amazon S3 - amazon-web-services

I need to update the ACL settings of over 1600 objects that are in over 160 folders in an S3 bucket.
The files have already been uploaded into s3.
Specifically, I need to do the following:
Give the owner (i.e. me) FULL CONTROL
disable anonymous/public READs
give my CloudFront user READ access (as determined by the canonical ID retrieved for the Origin Access Identity)
The files and folders have a standard naming convention:
s3://bucket/videos/XXXX/XXXX.mp4
s3://bucket/videos/XXXX/XXXX.webm
s3://bucket/videos/XXXX/XXXX.ogv
s3://bucket/videos/XXXX/XXXX-pf.jpg
s3://bucket/videos/XXXX/XXXX-lg.jpg
s3://bucket/videos/XXXX/XXXX-sm.jpg
XXXX is replaced by a number between 0001 and 9999
What is the easiest way to do that, because using the console is extremely time consuming.
I have s3cmd configured on my server. Can s3cmd handle that...and what would the syntax be?
If not s3cmd, what other tool would be available...command line preferred.

S3 bucket policies are the answer. Taken care of. Just thought I'd mention this.

Related

AWS S3 sync command from one bucket to another

I want to use the AWS S3 sync command to sync a large bucket with another bucket.
I found this answer that say that the files from the bucket synced over the AWS backbone and are not copied to the local machine but I can't find a reference anywhere in the documentation. Does anyone has a proof for this behavior? any formal documentation that explains how it works?
I tried to find something in the documentation but nothing there.
To learn more about the sync command, check CLI docs. You can directly refer to the section named -
Sync from S3 bucket to another S3 bucket
The following sync command syncs objects to a specified bucket and
prefix from objects in another specified bucket and prefix by copying
s3 objects. An s3 object will require copying if one of the following
conditions is true:
The s3 object does not exist in the specified bucket and prefix
destination.
The sizes of the two s3 objects differ.
The last modified time of the source is newer than the last modified time of the destination.
Use the S3 replication capability if you only want to replicate the data that moves from bucket1 to bucket2.

S3 Bucket AWS CLI takes forever to get specific files

I have a log archive bucket, and that bucket has 2.5m+ objects.
I am looking to download some specific time period files. For this I have tried different methods but all of them are failing.
My observation is those queries start from oldest file, but the files I seek are the newest ones. So it takes forever to find them.
aws s3 sync s3://mybucket . --exclude "*" --include "2021.12.2*" --include "2021.12.3*" --include "2022.01.01*"
Am I doing something wrong?
Is it possible to make these query start from newest files so it might take less time to complete?
I also tried using S3 Browser and CloudBerry. Same problem. Tried with a EC2 that is inside the same AWS network. Same problem.
2.5m+ objects in an Amazon S3 bucket is indeed a large number of objects!
When listing the contents of an Amazon S3 bucket, the S3 API only returns 1000 objects per API call. Therefore, when the AWS CLI (or CloudBerry, etc) is listing the objects in the S3 bucket it requires 2500+ API calls. This is most probably the reason why the request is taking so long (and possibly failing due to lack of memory to store the results).
You can possibly reduce the time by specifying a Prefix, which reduces the number of objects returned from the API calls. This would help if the objects you want to copy are all in a sub-folder.
Failing that, you could use Amazon S3 Inventory, which can provide a daily or weekly CSV file listing all objects. You could then extract from that CSV file a list of objects you want to copy (eg use Excel or write a program to parse the file). Then, specifically copy those objects using aws s3 cp or from a programming language. For example, a Python program could parse the script and then use download_file() to download each of the desired objects.
The simple fact is that a flat-structure Amazon S3 bucket with 2.5m+ objects will always be difficult to list. If possible, I would encourage you to use 'folders' to structure the bucket so that you would only need to list portions of the bucket at a time.

AWS CLI listing all the files within a S3 Bucket

I am practicing AWS commands. My client has given me the AWS IAM access key and the secret but not the account that I can log in to the admin panel. Those keys are being used with the project itself. What I am trying to do is that I am trying to list down all the files recursive within a S3 bucket.
This is what I have done so far.
I have configured the AWS profile for CLI using the following command
aws configure
Then I could list all the available buckets by running the following command
aws s3 ls
Then I am trying to list all the files within a bucket. I tried running the following command.
aws s3 ls s3://my-bucket-name
But it seems like it is not giving me the correct content. Also, I need a way to navigate around the bucket too. How can I do that?
You want to list all of the objects recursively but aren't using --recursive flag. This will only show prefixes and any objects at the root level
Relevant docs https://docs.aws.amazon.com/cli/latest/reference/s3/ls.html
A few options.
Roll your own
if you run an aws s3 ls and a line item has the word "PRE" instead of a modify date and size, that means it's a "directory" that you can traverse. You can write a quick bash script to run recursive aws s3 ls commands on everything that returns "PRE" indicating it's hiding more files.
s3fs
Using the S3FS-Fuse project on github, you can mount an S3 bucket on your file system and explore it that way. I haven't tested this and thus can't personally recommend it, but it seems viable, and might be a simple way to use tools you already have and understand (like tree).
One concern that I might have, is when I've used software similar to this it has made a lot of API calls and if left mounted for the long-term, it can run up costs just by the number of API calls.
Sync everything to a localhost (not recommended)
Adding this for completion, but you can run
aws s3 sync s3://mybucket/ ./
This will try to copy everything to your computer and you'll be able to use your own filesystem. However, s3 buckets can hold petabytes of data, so you may not be able to sync it all to your system. Also, s3 provides a lot of strong security precautions to protect the data, which your personal computer probably doesn't.

How to check is S3 service is available or not in AWS via CLI?

We have options to :
1. Copy file/object to another S3 location or local path (cp)
2. List S3 objects (ls)
3. Create bucket (mb) and move objects to bucket (mv)
4. Remove a bucket (rb) and remove an object (rm)
5. Sync objects and S3 prefixes
and many more.
But before using the commands, we need to check if S3 service is available in first place. How to do it?
Is there a command like :
aws S3 -isavailable
and we get response like
0 - S3 is available, I can go ahead upload object/create bucket etc.
1 - S3 is not availble, you can't upload object etc. ?
You should assume that Amazon S3 is available. If there is a problem with S3, you will receive an error when making a call with the Amazon CLI.
If you are particularly concerned, then add a simple CLI command first, eg aws s3 ls and throw away the results. But that's really the same concept. Or, you could use the --dry-run option available on many commands that simply indicates whether you would have had sufficient permissions to make the request, but doesn't actually run the request.
It is more likely that you will have an error in your configuration (eg wrong region, credentials not valid) than S3 being down.

Amazon S3 life cycle rule for folder

My application uses Amazon S3 to store some files, uploaded by customer. I want to set a rule that automatically should watch for particular folder's content, specifically - to delete files, that were created month ago. Is that possible?
Yes you can set a rule that automatically should watch for particular folder's content, specifically - to delete files, that were created month ago.
For this go to 'lifecycle policy' -> 'Expiration'. In 'Expiration' section set prefix as the path to files that you want to apply your rule.
For example: If I want to apply rule to 'fileA.txt' in folder 'myFolder' in bucket 'myBucket'. Then I should set prefix as 'myFolder/'.
Amazon S3 has a flat structure with no hierarchy like you would see in a typical file system. However, for the sake of organizational simplicity, the Amazon S3 console supports the folder concept as a means of grouping objects. Amazon S3 does this by using key name prefixes for objects.
For more info refer: http://docs.aws.amazon.com/AmazonS3/latest/UG/FolderOperations.html
Yes, it is possible to delete/expire and transition the objects to lower cost storage classes in AWS to save cost. You can find it under
S3 - [Your_Folder] - Management - Create Lifecycle rule
provide folder that you want to perform the action in prefix section as "folder/"
Yes. You can setup an S3 lifecycle policy that will make S3 automatically delete all files older than X days: http://docs.aws.amazon.com/AmazonS3/latest/dev/ObjectExpiration.html