Count objects in the individual folders of S3 bucket - amazon-web-services

I know we can easily count the total objects of S3 bucket with the below command, but how to get the count of objects in the individual folders of S3 bucket?
aws s3 ls s3://mybucket/ --recursive | wc -l
For example, if the bucket has subfolders like below, I want to know the count of objects in each date folder
~$aws s3 ls s3://mybucket/
PRE 2019-01-01/
PRE 2019-01-02/
PRE 2019-01-03/
PRE 2019-01-04/

You could use something like this Python3 script:
import boto3
from pathlib import Path
s3_resource = boto3.resource('s3', region_name='ap-southeast-2')
folders = {}
bucket = s3_resource.Bucket('my-bucket')
for object in bucket.objects.all():
path = Path(object.key).parent
folders[path] = folders.get(path, 0) + 1
for folder in folders:
print(folder, folders[folder])

Related

Copy nested Amazon S3 folders into flattened folder

Long story short, we have documents stored something like this /accounts/account-abc/docs/uuid.pdf which is pretty redundant. What we want is basically docs/uuid.pdf. There are lots of other posts about copying, but they are all single dirs. I need something like this (which is obviously wrong):
aws s3 cp s3://accounts/*/docs s3://docs/ --recursive ---include "*"
Would I need to write a custom script in order to acomplish the above?
Here's a Python script that will copy files from a given SOURCE_PATH to a TARGET_PATH, removing all sub-folders:
import boto3
SOURCE_BUCKET = 'source-bucket'
SOURCE_PATH = 'accounts/'
TARGET_BUCKET = 'target-bucket'
TARGET_PATH = 'docs/'
s3_resource = boto3.resource('s3')
bucket = s3_resource.Bucket(SOURCE_BUCKET)
for object in bucket.objects.filter(Prefix=SOURCE_PATH):
target_key = object.key[object.key.rfind('/')+1:]
print('Copying', target_key)
s3_resource.Object(TARGET_BUCKET, TARGET_PATH + target_key).copy({'Bucket':SOURCE_BUCKET, 'Key': object.key})
# Optional, to delete source object:
# object.delete()
You might need to modify it if you only wish to copy from a SOURCE_PATH that also contains a sub-directory of docs (based on your example).

How to download data from AWS in python

I am new to AWS and boto. The data I want to download is on AWS, and I have the access key and the secret key. My problem is I do not understand the approaches I found. For instance, this code:
import boto
import boto.s3.connection
def download_data_connect_s3(access_key, secret_key, region, bucket_name, key, local_path):
conn = boto.connect_s3(aws_access_key_id = access_key,\
aws_secret_access_key = secret_key,\
host='s3-{}.amazonaws.com'.format(region),\
calling_format = boto.s3.connection.OrdinaryCallingFormat()\
)
bucket = conn.get_bucket(bucket_name)
key = bucket.get_key(key)
key.get_contents_to_filename(local_path)
print('Downloaded File {} to {}'.format(key, local_path))
region = 'us-west-1'
access_key = # the key here
secret_key = # the secret key here
bucket_name = 'temp_name'
key = '<folder…/filename>' unique identifer
local_path = # local path
download_data_connect_s3(access_key, secret_key, region, bucket_name, key, local_path)
What I don't understand is the 'key' 'bucket_name' and 'local path'. What is 'key' in comparison to access key and secret key? I was not given a 'key'. Also, is the 'bucket_name' the name of the bucket on AWS (I was not provided with the bucket name); and local path the directory where I want to save the data?
You are right.
bucket_name = name of your S3 bucket
key = is object key. It's full path of the file in side the bucket. (ex: you have a file named a.txt in folder x, so key = x/a.txt. Refer to this link
local_path = where you want to save the data in local machine
It sounds like the data is stored in Amazon S3.
You can use the AWS Command-Line Interface (CLI) to access Amazon S3.
To view the list of buckets in that account:
aws s3 ls
To view the contents of a bucket:
aws s3 ls bucket-name
To copy a file from a bucket to the current directory:
aws s3 cp s3://bucket-name/filename.txt .
Or sync a whole folder:
aws s3 sync s3://bucket-name/folder/ local-folder/

How to delete images with suffix from folder in S3 bucket

I have stored multiple sizes of the image on s3.
e.g. image100_100,image200_200,image300_150;
I want to delete the specific size of images like images with suffix 200_200 from the folder. there are a lot of images in this folder so how to delete these images?
Use AWS command-line interface (AWS CLI):
aws s3 rm s3://Path/To/Dir/ --recursive --exclude "*" --include "*200_200"
We first exclude everything, then include what we need to delete. This is a workaround to mimic the behavior of rm -r "*200_200" command in Linux.
The easiest method would be to write a Python script, similar to:
import boto3
BUCKET = 'my-bucket'
PREFIX = '' # eg 'images/'
s3_client = boto3.client('s3', region_name='ap-southeast-2')
# Get a list of objects
list_response = s3_client.list_objects_v2(Bucket = BUCKET, Prefix = PREFIX)
while True:
# Find desired objects to delete
objects = [{'Key':object['Key']} for object in list_response['Contents'] if object['Key'].endswith('200_200')]
print ('Deleting:', objects)
# Delete objects
if len(objects) > 0:
delete_response = s3_client.delete_objects(
Bucket=BUCKET,
Delete={'Objects': objects}
)
# Next page
if list_response['IsTruncated']:
list_response = s3_client.list_objects_v2(
Bucket = BUCKET,
Prefix = PREFIX,
ContinuationToken=list_reponse['NextContinuationToken'])
else:
break

Copy Without Prefix s3

I have directory structures in s3 like
bucket/folder1/*/*.csv
Where the folder wildcard refers to a number of different folders containing csv files.
I want to copy them at without the prefix to
bucket/folder2/*.csv
Ex:
bucket/folder1/
s3distcp --src=s3://bucket/folder1/ --dests3://bucket/folder2/ --srcPattern=.*/csv
Results in the undesired structure of:
bucket/folder2/*/*.csv
I need a solution to copy in bulk that is scalable. Can I do this with s3distcp? Can I do this with aws s3 cp (without having to execute the aws s3 cp per file)?
You should try the following CLI command
aws s3 sync s3://SOURCE_BUCKET_NAME s3://DESTINATION_BUCKET_NAME --recursive
There is no shortcut to do what you wish, because you are manipulating the path to the objects.
You could instead write a little program to do it, such as:
import boto3
BUCKET = 'my-bucket'
s3_client = boto3.client('s3', region_name = 'ap-southeast-2')
# Get a list of objects in folder1
response = s3_client.list_objects_v2(Bucket=BUCKET, Prefix='folder1')
# Copy files to folder2, keeping a flat hierarchy
for object in response['Contents']:
key = object['Key']
print(key)
s3_client.copy_object(
CopySource={'Bucket': BUCKET, 'Key': key},
Bucket=BUCKET,
Key = 'folder2' + key[key.rfind('/'):]
)
Ended up using Apache Nifi to do this, changing the filename attribute of the flowfile (use regex to remove all of the path before the last '/') and writing with a prefix to the desired directory. It scales really well.

How to rename files and folder in Amazon S3?

Is there any function to rename files and folders in Amazon S3? Any related suggestions are also welcome.
I just tested this and it works:
aws s3 --recursive mv s3://<bucketname>/<folder_name_from> s3://<bucket>/<folder_name_to>
There is no direct method to rename a file in S3. What you have to do is copy the existing file with a new name (just set the target key) and delete the old one.
aws s3 cp s3://source_folder/ s3://destination_folder/ --recursive
aws s3 rm s3://source_folder --recursive
You can use the AWS CLI commands to mv the files
You can either use AWS CLI or s3cmd command to rename the files and folders in AWS S3 bucket.
Using S3cmd, use the following syntax to rename a folder,
s3cmd --recursive mv s3://<s3_bucketname>/<old_foldername>/ s3://<s3_bucketname>/<new_folder_name>
Using AWS CLI, use the following syntax to rename a folder,
aws s3 --recursive mv s3://<s3_bucketname>/<old_foldername>/ s3://<s3_bucketname>/<new_folder_name>
I've just got this working. You can use the AWS SDK for PHP like this:
use Aws\S3\S3Client;
$sourceBucket = '*** Your Source Bucket Name ***';
$sourceKeyname = '*** Your Source Object Key ***';
$targetBucket = '*** Your Target Bucket Name ***';
$targetKeyname = '*** Your Target Key Name ***';
// Instantiate the client.
$s3 = S3Client::factory();
// Copy an object.
$s3->copyObject(array(
'Bucket' => $targetBucket,
'Key' => $targetKeyname,
'CopySource' => "{$sourceBucket}/{$sourceKeyname}",
));
http://docs.aws.amazon.com/AmazonS3/latest/dev/CopyingObjectUsingPHP.html
This is now possible for Files, select the file then select Actions > Rename in the GUI.
To rename a folder, you instead have to create a new folder, and select the contents of the old one and copy/paste it across (Under "Actions" again)
We have 2 ways by which we can rename a file on AWS S3 storage -
1 .Using the CLI tool -
aws s3 --recursive mv s3://bucket-name/dirname/oldfile s3://bucket-name/dirname/newfile
2.Using SDK
$s3->copyObject(array(
'Bucket' => $targetBucket,
'Key' => $targetKeyname,
'CopySource' => "{$sourceBucket}/{$sourceKeyname}",));
To rename a folder (which is technically a set of objects with a common prefix as key) you can use the aws CLI move command with --recursive option.
aws s3 mv s3://bucket/old_folder s3://bucket/new_folder --recursive
There is no way to rename a folder through the GUI, the fastest (and easiest if you like GUI) way to achieve this is to perform an plain old copy. To achieve this: create the new folder on S3 using the GUI, get to your old folder, select all, mark "copy" and then navigate to the new folder and choose "paste". When done, remove the old folder.
This simple method is very fast because it is copies from S3 to itself (no need to re-upload or anything like that) and it also maintains the permissions and metadata of the copied objects like you would expect.
Here's how you do it in .NET, using S3 .NET SDK:
var client = new Amazon.S3.AmazonS3Client(_credentials, _config);
client.CopyObject(oldBucketName, oldfilepath, newBucketName, newFilePath);
client.DeleteObject(oldBucketName, oldfilepath);
P.S. try to use use "Async" versions of the client methods where possible, even though I haven't done so for readability
This works for renaming the file in the same folder
aws s3 mv s3://bucketname/folder_name1/test_original.csv s3://bucket/folder_name1/test_renamed.csv
Below is the code example to rename file on s3. My file was part-000* because of spark o/p file, then i copy it to another file name on same location and delete the part-000*:
import boto3
client = boto3.client('s3')
response = client.list_objects(
Bucket='lsph',
MaxKeys=10,
Prefix='03curated/DIM_DEMOGRAPHIC/',
Delimiter='/'
)
name = response["Contents"][0]["Key"]
copy_source = {'Bucket': 'lsph', 'Key': name}
client.copy_object(Bucket='lsph', CopySource=copy_source,
Key='03curated/DIM_DEMOGRAPHIC/'+'DIM_DEMOGRAPHIC.json')
client.delete_object(Bucket='lsph', Key=name)
File and folder are in fact objects in S3. You should use PUT OBJECT COPY to rename them. See http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html
rename all the *.csv.err files in the <<bucket>>/landing dir into *.csv files with s3cmd
export aws_profile='foo-bar-aws-profile'
while read -r f ; do tgt_fle=$(echo $f|perl -ne 's/^(.*).csv.err/$1.csv/g;print'); \
echo s3cmd -c ~/.aws/s3cmd/$aws_profile.s3cfg mv $f $tgt_fle; \
done < <(s3cmd -r -c ~/.aws/s3cmd/$aws_profile.s3cfg ls --acl-public --guess-mime-type \
s3://$bucket | grep -i landing | grep csv.err | cut -d" " -f5)
As answered by Naaz direct renaming of s3 is not possible.
i have attached a code snippet which will copy all the contents
code is working just add your aws access key and secret key
here's what i did in code
-> copy the source folder contents(nested child and folders) and pasted in the destination folder
-> when the copying is complete, delete the source folder
package com.bighalf.doc.amazon;
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import java.util.List;
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3Client;
import com.amazonaws.services.s3.model.CopyObjectRequest;
import com.amazonaws.services.s3.model.ObjectMetadata;
import com.amazonaws.services.s3.model.PutObjectRequest;
import com.amazonaws.services.s3.model.S3ObjectSummary;
public class Test {
public static boolean renameAwsFolder(String bucketName,String keyName,String newName) {
boolean result = false;
try {
AmazonS3 s3client = getAmazonS3ClientObject();
List<S3ObjectSummary> fileList = s3client.listObjects(bucketName, keyName).getObjectSummaries();
//some meta data to create empty folders start
ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentLength(0);
InputStream emptyContent = new ByteArrayInputStream(new byte[0]);
//some meta data to create empty folders end
//final location is the locaiton where the child folder contents of the existing folder should go
String finalLocation = keyName.substring(0,keyName.lastIndexOf('/')+1)+newName;
for (S3ObjectSummary file : fileList) {
String key = file.getKey();
//updating child folder location with the newlocation
String destinationKeyName = key.replace(keyName,finalLocation);
if(key.charAt(key.length()-1)=='/'){
//if name ends with suffix (/) means its a folders
PutObjectRequest putObjectRequest = new PutObjectRequest(bucketName, destinationKeyName, emptyContent, metadata);
s3client.putObject(putObjectRequest);
}else{
//if name doesnot ends with suffix (/) means its a file
CopyObjectRequest copyObjRequest = new CopyObjectRequest(bucketName,
file.getKey(), bucketName, destinationKeyName);
s3client.copyObject(copyObjRequest);
}
}
boolean isFodlerDeleted = deleteFolderFromAws(bucketName, keyName);
return isFodlerDeleted;
} catch (Exception e) {
e.printStackTrace();
}
return result;
}
public static boolean deleteFolderFromAws(String bucketName, String keyName) {
boolean result = false;
try {
AmazonS3 s3client = getAmazonS3ClientObject();
//deleting folder children
List<S3ObjectSummary> fileList = s3client.listObjects(bucketName, keyName).getObjectSummaries();
for (S3ObjectSummary file : fileList) {
s3client.deleteObject(bucketName, file.getKey());
}
//deleting actual passed folder
s3client.deleteObject(bucketName, keyName);
result = true;
} catch (Exception e) {
e.printStackTrace();
}
return result;
}
public static void main(String[] args) {
intializeAmazonObjects();
boolean result = renameAwsFolder(bucketName, keyName, newName);
System.out.println(result);
}
private static AWSCredentials credentials = null;
private static AmazonS3 amazonS3Client = null;
private static final String ACCESS_KEY = "";
private static final String SECRET_ACCESS_KEY = "";
private static final String bucketName = "";
private static final String keyName = "";
//renaming folder c to x from key name
private static final String newName = "";
public static void intializeAmazonObjects() {
credentials = new BasicAWSCredentials(ACCESS_KEY, SECRET_ACCESS_KEY);
amazonS3Client = new AmazonS3Client(credentials);
}
public static AmazonS3 getAmazonS3ClientObject() {
return amazonS3Client;
}
}
In the AWS console, if you navigate to S3, you will see your folders listed. If you navigate to the folder, you will see the object (s) listed. right click and you can rename. OR, you can check the box in front of your object, then from the pull down menu named ACTIONS, you can select rename. Just worked for me, 3-31-2019
If you want to rename a lot of files from an s3 folder you can run the following script.
FILES=$(aws s3api list-objects --bucket your_bucket --prefix 'your_path' --delimiter '/' | jq -r '.Contents[] | select(.Size > 0) | .Key' | sed '<your_rename_here>')
for i in $FILES
do
aws s3 mv s3://<your_bucket>/${i}.gz s3://<your_bucket>/${i}
done
What I did is create a new folder and move older files object to the new folder.
There are a lot of 'issues' with folder structures in s3 it seems as the storage is flat.
I have a Django project where I needed the ability to rename a folder but still keep the directory structure in-tact, meaning empty folders would need to be copied and stored in the renamed directory as well.
aws cli is great but neither cp or sync or mv copied empty folders (i.e. files ending in '/') over to the new folder location, so I used a mixture of boto3 and the aws cli to accomplish the task.
More or less I find all folders in the renamed directory and then use boto3 to put them in the new location, then I cp the data with aws cli and finally remove it.
import threading
import os
from django.conf import settings
from django.contrib import messages
from django.core.files.storage import default_storage
from django.shortcuts import redirect
from django.urls import reverse
def rename_folder(request, client_url):
"""
:param request:
:param client_url:
:return:
"""
current_property = request.session.get('property')
if request.POST:
# name the change
new_name = request.POST['name']
# old full path with www.[].com?
old_path = request.POST['old_path']
# remove the query string
old_path = ''.join(old_path.split('?')[0])
# remove the .com prefix item so we have the path in the storage
old_path = ''.join(old_path.split('.com/')[-1])
# remove empty values, this will happen at end due to these being folders
old_path_list = [x for x in old_path.split('/') if x != '']
# remove the last folder element with split()
base_path = '/'.join(old_path_list[:-1])
# # now build the new path
new_path = base_path + f'/{new_name}/'
# remove empty variables
# print(old_path_list[:-1], old_path.split('/'), old_path, base_path, new_path)
endpoint = settings.AWS_S3_ENDPOINT_URL
# # recursively add the files
copy_command = f"aws s3 --endpoint={endpoint} cp s3://{old_path} s3://{new_path} --recursive"
remove_command = f"aws s3 --endpoint={endpoint} rm s3://{old_path} --recursive"
# get_creds() is nothing special it simply returns the elements needed via boto3
client, resource, bucket, resource_bucket = get_creds()
path_viewing = f'{"/".join(old_path.split("/")[1:])}'
directory_content = default_storage.listdir(path_viewing)
# loop over folders and add them by default, aws cli does not copy empty ones
# so this is used to accommodate
folders, files = directory_content
for folder in folders:
new_key = new_path+folder+'/'
# we must remove bucket name for this to work
new_key = new_key.split(f"{bucket}/")[-1]
# push this to new thread
threading.Thread(target=put_object, args=(client, bucket, new_key,)).start()
print(f'{new_key} added')
# # run command, which will copy all data
os.system(copy_command)
print('Copy Done...')
os.system(remove_command)
print('Remove Done...')
# print(bucket)
print(f'Folder renamed.')
messages.success(request, f'Folder Renamed to: {new_name}')
return redirect(request.META.get('HTTP_REFERER', f"{reverse('home', args=[client_url])}"))
S3DirectoryInfo has a MoveTo method that will move one directory into another directory, such that the moved directory will become a subdirectory of the other directory with the same name as it originally had.
The extension method below will move one directory to another directory, i.e. the moved directory will become the other directory. What it actually does is create the new directory, move all the contents of the old directory into it, and then delete the old one.
public static class S3DirectoryInfoExtensions
{
public static S3DirectoryInfo Move(this S3DirectoryInfo fromDir, S3DirectoryInfo toDir)
{
if (toDir.Exists)
throw new ArgumentException("Destination for Rename operation already exists", "toDir");
toDir.Create();
foreach (var d in fromDir.EnumerateDirectories())
d.MoveTo(toDir);
foreach (var f in fromDir.EnumerateFiles())
f.MoveTo(toDir);
fromDir.Delete();
return toDir;
}
}
There is one software where you can play with the s3 bucket for performing different kinds of operation.
Software Name: S3 Browser
S3 Browser is a freeware Windows client for Amazon S3 and Amazon CloudFront. Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. Amazon CloudFront is a content delivery network (CDN). It can be used to deliver your files using a global network of edge locations.
If it's only single time then you can use the command line to perform these operations:
(1) Rename the folder in the same bucket:
s3cmd --access_key={access_key} --secret_key={secret_key} mv s3://bucket/folder1/* s3://bucket/folder2/
(2) Rename the Bucket:
s3cmd --access_key={access_key} --secret_key={secret_key} mv s3://bucket1/folder/* s3://bucket2/folder/
Where,
{access_key} = Your valid access key for s3 client
{secret_key} = Your valid scret key for s3 client
It's working fine without any problem.
Thanks