AWS CLI Download list of S3 files

AWS CLI Download list of S3 files - amazon-web-services

We have ~400,000 files on a private S3 bucket that are inbound/outbound call recordings. The files have a certain pattern to it that lets me search for numbers both inbound and outbound. Note these calls are on the Glacier storage class
Using AWS CLI, I can search through this bucket and grep the files I need out. What I'd like to do is now initiate an S3 restore job to expedited retrieval (so ~1-5 minute recovery time), and then maybe 30 minutes later run a command to download the files.
My efforts so far:
aws s3 ls s3://exetel-logs/ --recursive | grep .*042222222.* | cut -c 32-
Retreives the key of about 200 files. I am unsure of how to proceed next, as aws s3 cp wont work for any objects in storage class.
Cheers,

The AWS CLI has two separate commands for S3: s3 ands3api. s3 is a high level abstraction with limited features, so for restoring files, you'll have to use one of the commands available with s3api:
aws s3api restore-object --bucket exetel-logs --key your-key
If you afterwards want to copy the files, but want to ensure to only copy files which were restored from Glacier, you can use the following code snippet:
for key in $(aws s3api list-objects-v2 --bucket exetel-logs --query "Contents[?StorageClass=='GLACIER'].[Key]" --output text); do
if [ $(aws s3api head-object --bucket exetel-logs --key ${key} --query "contains(Restore, 'ongoing-request=\"false\"')") == true ]; then
echo ${key}
fi
done

Have you considered using a high-level language wrapper for the AWS CLI? It will make these kinds of tasks easier to integrate into your workflows. I prefer the Python implementation (Boto 3). Here is example code for how to download all files from an S3 bucket.

Related

Copy the latest uploaded file from S3 bucket to local machine

I have a cron job set that moves the files from an EC2 instance to S3
aws s3 mv --recursive localdir s3://bucket-name/ --exclude "*" --include "localdir/*"
After that I use aws s3 sync s3://bucket-name/data1/ E:\Datafolder in .bat file and run task scheduler in Windows to run the command.
The issue is that s3 sync command copies all the files in /data1/ prefix.
So let's say I have the following files:
Day1: file1 is synced to local.
Day2: file1 and file2 are synced to local because file1 is removed from the local machine's folder.
I don't want them to occupy space on local machine. On Day 2, I just want file2 to be copied over.
Can this be accomplished by AWS CLI commands? or do I need to write a lambda function?
I followed the answer from Get last modified object from S3 using AWS CLI
but on Windows, the | and awk commands are not working as expected.

To obtain the name of the object that has the most recent Last Modified date, you can use:
aws s3api list-objects-v2 --bucket BUCKET-NAME --query 'sort_by(Contents, &LastModified)[-1].Key' --output text
Therefore (using shell syntax), you could use:
object=`aws s3api list-objects-v2 --bucket BUCKET-NAME --prefix data1/ --query 'sort_by(Contents, &LastModified)[-1].Key' --output text`
aws s3 cp s3://BUCKET-NAME/$object E:\Datafolder
You might need to tweak it to get it working on Windows.
Basically, it gets the bucket listing, sorts by LastModified, then grabs the name of the last object in the list.

Modified answer to work with Windows .bat file. Uses Windows cmd.exe
for /f "delims=" %%i in ('aws s3api list-objects-v2 --bucket BUCKET-NAME --prefix data1/ --query "sort_by(Contents, &LastModified)[-1].Key" --output text') do set object=%%i
aws s3 cp s3://BUCKET-NAME/%object% E:\Datafolder

AWS S3 old object creation date

We want to find out files uploaded/creation date of oldest object present in AWS S3 bucket.
Could you please suggest how we can get it.

You can use the AWS Command-Line Interface (CLI) to list objects sorted by a field:
aws s3api list-objects --bucket MY-BUCKET --query 'sort_by(Contents, &LastModified)[0].[Key,LastModified]' --output text
This gives an output like:
foo.txt 2021-08-17T21:53:46+00:00
See also: How to list recent files in AWS S3 bucket with AWS CLI or Python

How to copy subset of files from one S3 bucket folder to another by date

I have a bucket in AWS S3. There are two folders in the bucket - folder1 & folder2. I want to copy the files from s3://myBucket/folder1 to s3://myBucket/folder2. But there is a twist: I ONLY want to copy the items in folder1 that were created after a certain date. I want to do something like this:
aws s3 cp s3://myBucket/folder1 s3://myBucket/folder2 --recursive --copy-source-if-modified-since
2020-07-31

There is no aws-cli command that will do this for you in a single line. If the number of files is relatively small, say a hundred thousands or fewer I think it would be easiest to write a bash script, or use your favourite language's AWS SDK, that lists the first folder, filters on creation date and issues the copy commands.
If the number of files is large you can create an S3 Inventory that will give you a listing of all the files in the bucket, which you can download and generate the copy commands from. This will be cheaper and quicker than listing when there are lots and lots of files.
Something like this could be a start, using #jarmod's suggestion about --copy-source-if-modified-since:
for key in $(aws s3api list-objects --bucket my-bucket --prefix folder1/ --query 'Contents[].Key' --output text); do
relative_key=${key/folder1/folder2}
aws s3api copy-object --bucket my-bucket --key "$relative_key" --source-object "my-bucket/$key" --copy-source-if-modified-since THE_CUTOFF_DATE
done
It will copy each object individually, and it will be fairly slow if there are lots of objects, but it's at least somewhere to start.

Apply encryption to existing S3 objects without impacting storage class

I am trying to encrypt an existing s3 bucket. When I do this:
aws s3 cp s3://test/ s3://test/ --recursive --sse
it is encrypting all the files in the bucket by re-copying the objects. My issue here is that I have objects in the bucket in Standard, Standard-IA and Glacier storage classes. So, when I run the above copy command the objects in Standard-IA storage are converted to standard storage. (I haven't tested what happens to objects in glacier yet - probably it won't even allow me to copy.)
Is there any way where we can restore the storage type of an object and just enable encryption for an existing bucket?

You could do something like this using bash and JQ, obviously python with boto3 or similar would be cleaner.
I don't know if you'd be better off adding a check to skip the GLACIER files, there's no magic way to apply encryption to them without unfreezing, then re-freezing them.
You'll want to run this on an ec2 instance local to the s3 bucket.
#!/bin/bash
bucketname="bucket-name"
aws s3 ls ${bucketname} --recursive | awk '{ print $NF }' > /tmp/filelist
for file in `cat /tmp/filelist`
do
class=`aws s3api head-object --bucket ${bucketname} --key $file | jq '.StorageClass' | sed 's/\"//g'`
if [ "$class" = "null" ]
then
class="STANDARD"
fi
echo "aws s3 cp s3://${bucketname}/${file} s3://${bucketname}/${file} --sse --storage-class ${class}"
done

You need to add the command line option --storage-class STANDARD_IA

Does your bucket have a lifecycle policy? If so - it's actually behaving like it's supposed to - you are, in effect, creating a new object in the bucket, so the transition over to standard is, in fact, correct.
The option by Ewan Leith above is really the only way to do it - programmatically determine the current storage state, then override the storage of the 'new' item on save.
Hope this helps...

How do I use the aws cli to set permissions on files in an S3 bucket?

I am new to the aws cli and I've spent a fair amount of time in the documentation but I can't figure out how to set permissions on files after I've uploaded them. So if I uploaded a file with:
aws s3 cp assets/js/d3-4.3.0.js s3://example.example.com/assets/js/
and didn't set access permissions, I need a way to set them. Is there an equivalent to chmod 644 in the aws cli?
And for that matter is there a way to view access permission?
I know I could use the --acl public-read flag with aws s3 cp but if I didn't, can I set access without repeating the full copy command?

The awscli supports two groups of S3 actions: s3 and s3api.
You can use aws s3api put-object-acl to set the ACL permissions on an existing object.
The logic behind there being two sets of actions is as follows:
s3: high-level abstractions with file system-like features such as ls, cp, sync
s3api: one-to-one with the low-level S3 APIs such as put-object, head-bucket
In your case, the command to execute is:
aws s3api put-object-acl --bucket example.example.com --key assets/js/d3-4.3.0.js --acl public-read

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

AWS CLI Download list of S3 files - amazon-web-services

Have you considered using a high-level language wrapper for the AWS CLI? It will make these kinds of tasks easier to integrate into your workflows. I prefer the Python implementation (Boto 3). Here is example code for how to download all files from an S3 bucket.

Related

Copy the latest uploaded file from S3 bucket to local machine

AWS S3 old object creation date

How to copy subset of files from one S3 bucket folder to another by date

Apply encryption to existing S3 objects without impacting storage class

How do I use the aws cli to set permissions on files in an S3 bucket?

Categories

Resources