S3 bucket size for subset bucket names - amazon-web-services

How can I use a custom list of S3 bucket names from local file as sometimes it takes too long for large buckets or different storage class. not sure sometimes it doesn't show all S3 buckets?
with open('subsetbucketslist.txt') as f:
allbuckets = f.read().splitlines()
How to use local file of buckets names as input?
By default it would list all buckets:
import boto3
total_size = 0
s3=boto3.resource('s3')
for mybucket in s3.buckets.all():
mybucket_size=sum([object.size for object in boto3.resource('s3').Bucket(mybucket.name).objects.all()])
print (mybucket.name, mybucket_size)

If you want to calculate the size for particular buckets, then put those bucket names in your for loop:
import boto3
total_size = 0
s3 = boto3.resource('s3')
with open('subsetbucketslist.txt') as f:
allbuckets = f.read().splitlines()
for bucket_name in allbuckets:
mybucket_size = sum([object.size for object in boto3.resource('s3').Bucket(bucket_name).objects.all()])
print (bucket_name, mybucket_size)
It's also worth mentioning that Amazon CloudWatch keeps track of bucket sizes (BucketSizeBytes). See: Metrics and dimensions - Amazon Simple Storage Service

Related

AWS lambda with dynamic trigger for S3 buckets

I have a S3 bucket by name "archive_A". I have created a lambda function to retrieve meta data info for any object "creation" or "permanently delete" from S3 bucket as triggers to my lambda function (python) and insert the meta data collected into DynamoDB.
For S3 bucket archive_A, I have manually added the triggers, one for "creation" and another one for "permanently delete" in my lambda function via GUI.
import boto3
from uuid import uuid4
def lambda_handler(event, context):
s3 = boto3.client("s3")
dynamodb = boto3.resource('dynamodb')
for record in event['Records']:
bucket_name = record['s3']['bucket']['name']
object_key = record['s3']['object']['key']
size = record['s3']['object'].get('size', -1)
event_name = record ['eventName']
event_time = record['eventTime']
dynamoTable = dynamodb.Table('S3metadata')
dynamoTable.put_item(
Item={'Resource_id': str(uuid4()), 'Bucket': bucket_name, 'Object': object_key,'Size': size, 'Event': event_name, 'EventTime': event_time})
In the future there could be more S3 buckets like archive_B, archive_C etc. In that case I have to keep adding triggers manually for each S3 bucket which is bit cumbersome.
Is there any dynamic way or adding triggers to lambda for S3 buckets with name "archive_*" and hence any future S3 bucket with name like "archive_G" will have a dynamically added triggers to lambda.
Please suggest. I am quite new to AWS too. Any example would be easier to follow.
There is no in-built way to automatically add triggers for new buckets.
You could probably create an Amazon EventBridge rule that triggers on CreateBucket and calls an AWS Lambda function with details of the new bucket.
That Lambda function could then programmatically add a trigger on your existing Lambda function.

how do i make this python script go through s3 object keys and specify which bucket is empty if no keys exist

This function prints list of s3 buckets as well as their object keys. How to make this function print the list of empty buckets only?
import boto3
s3 = boto3.resource('s3')
def empty_s3():
#This will print list of all buckets
print("\nList of S3 buckets:")
for bucket in s3.buckets.all():
print(bucket.name)
#This will print s3 bucket object keys
for object in bucket.objects.all():
print(object)
You can simply check the number of objects:
import boto3
s3_resource = boto3.resource('s3')
for bucket in s3_resource.buckets.all():
objects = list(bucket.objects.all())
# Empty bucket?
if len(objects) == 0:
print(bucket.name)

I would like to know how to import data into the app by modifying this lambda code

import boto3
import json
s3 = boto3.client('s3')
def lambda_handler(event, context):
bucket = "cloud-translate-output"
key = "key value"
try:
data = s3.get_object(Bucket=bucket, Key=key)
json_data = data["Body"].read()
return{
"response_code" : 200,
"data": str(json_data)
}
except Exception as e:
print (e)
raise e
I'm making ios app with xcode.
And I want to use aws to bring data from s3 to app in order of app-api gateway-lambda-s3. But is there a way to use the data of api using api in app, and if I upload this data to bucket number 1 of s3, the cloudformation will translate the uploaded text file and automatically save it to bucket number 2, and I want to import the text data file stored in bucket number 2 back to app through lambda, not key value, is there a way to use only the name of the bucket?
if I upload this data to bucket number 1 of s3, the cloudformation will translate the uploaded text file and automatically save it to bucket number 2
Sadly, this is not how CloudFormation work. It can't read or translate automatically any files from buckets, or upload them to new buckets.
I would stick with a lambda function. It is more suited to such tasks.

boto3 get all keys in filtered bucket as a list

I'd like to get all keys in a "subfolder" in an s3 bucket, and put these keys into a list so that I can use the list for multiprocessing the files on s3. My current approach is below, and takes about 5 minutes for a "subfolder" that has ~800,000 items in it. Is there a quicker way of accomplishing this?
import boto3
s3 = boto3.resource('s3')
mybucket = s3.Bucket('bucket-name')
ls_keys = []
for obj in mybucket.objects.filter(Prefix='foo/bar'):
ls_keys.append(obj.key)

Boto3 get only S3 buckets of specific region

The following code sadly lists all buckets of all regions and not only from "eu-west-1" as specified. How can I change that?
import boto3
s3 = boto3.client("s3", region_name="eu-west-1")
for bucket in s3.list_buckets()["Buckets"]:
bucket_name = bucket["Name"]
print(bucket["Name"])
s3 = boto3.client("s3", region_name="eu-west-1")
connects to S3 API endpoint in eu-west-1. It doesn't limit the listing to eu-west-1 buckets. One solution is to query the bucket location and filter.
s3 = boto3.client("s3")
for bucket in s3.list_buckets()["Buckets"]:
if s3.get_bucket_location(Bucket=bucket['Name'])['LocationConstraint'] == 'eu-west-1':
print(bucket["Name"])
If you need a one liner using Python's list comprehension:
region_buckets = [bucket["Name"] for bucket in s3.list_buckets()["Buckets"] if s3.get_bucket_location(Bucket=bucket['Name'])['LocationConstraint'] == 'eu-west-1']
print(region_buckets)
The solution above does not always work for buckets in some US regions because the 'LocationConstraint' can be null. Here is another solution:
s3 = boto3.client("s3")
for bucket in s3.list_buckets()["Buckets"]:
if s3.head_bucket(Bucket=bucket['Name'])['ResponseMetadata']['HTTPHeaders']['x-amz-bucket-region'] == 'us-east-1':
print(bucket["Name"])
The SDK method:
s3.head_bucket(Bucket=[INSERT_BUCKET_NAME_HERE])['ResponseMetadata']['HTTPHeaders']['x-amz-bucket-region']
... should always give you the bucket region. Thanks to sd65 for the tip: https://github.com/boto/boto3/issues/292