Django-Storages change s3 bucket on existing object - django

I have a django app that allows files to be uploaded to an S3 bucket using django-storages. The file is for a part that needs to be approved. Once it is approved, I'd like to move the file to a different S3 bucket.
class DesignData(models.Model):
file = models.FileField(storage=PublicMediaStorage())
...
class PublicMediaStorage(S3Boto3Storage):
location = "media"
default_acl = "public-read"
file_overwrite = False
After approval, I copy the file over to the new bucket using:
client.copy_object(
Bucket=settings.AWS_APPROVED_STORAGE_BUCKET_NAME,
CopySource=copy_source,
Key=design_data["s3key"],
)
The file gets moved correctly however I need to update my object. How can I update the object? Trying something like myObject.file = "newbucket/myfile.txt" won't work as it is expecting an actual file. I've read I should be able to update the url with myObject.file.url = "newbucketaddress/myfile.txt" but I get an error AttributeError: can't set attribute.
Is there a way in django-storages with s3 to update an existing file s3 bucket?

You may have to manually build the path to the new bucket. Boto3 docs will guide you.
Get the bucket location, name and object key. You can build the path to the object

I ended up using somewhat of a workaround to resolve my issue. I ended up changing my model to include another file which would then store the new s3 bucket location.
class DesignData(models.Model):
file = models.FileField(storage=PublicMediaStorage())
approved_file = models.FileField(storage=ApprovedPublicMediaStorage())
I added ApprovedPublicMediaStorage():
class PublicMediaStorage(S3Boto3Storage):
location = "media"
default_acl = "public-read"
file_overwrite = False
custom_domain = "newbucketlocation0.s3...."
I did learn the hard way so make sure to include the custom_domain otherwise it will use the default_storage in the settings if it has been assigned. After I copy the file over to the new storage, I delete the old file and change file = null so only approved_file contains an s3 object.

Related

unable to get object metadata from S3. Check object key, region and/or access permissions."

I have a Lambda function that scans for text and is triggered by an S3 bucket. I get this error when trying to upload a photo directly into s3 bucket using browser
Unable to get object metadata from S3. Check object key, region, and/or access permissions
However, if I hardcode the key (e.g., image01.jpg) which is in my bucket, there are no errors.
import json
import boto3
def lambda_handler(event, context):
# Get bucket and file name
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
location = key[:17]
s3Client = boto3.client('s3')
client = boto3.client('rekognition', region_name='us-east-1')
response=client.detect_text(Image={'S3Object':
{'Bucket':'myarrowbucket','Name':key}})
detectedText = response['TextDetections']
I am confused as it was working a few weeks ago but now i am getting that error
ANSWER
I have seen this question answered many times and i tried every solution , the one which worked for me was 'key' name . i was getting the metadata error when the filename contained special characters e.g - or _ but when i changed the names of the files uploaded it works . Hope this answer helps someone.

Grouping S3 Files Into Subfolders

I have a pipeline that moves approximately 1 TB of data, all CSV files. In this pipeline there are hundreds of files with different names. They have a date component, which is automatically partitioned. My question is how to use the CDK to automatically create subfolders based on the name of the file. In other words, the data comes in as broad category, but our data scientists need it at one more level of detail.
It appears that your requirement is to move incoming objects into folders based on information in their filename (Key).
This could be done by adding a trigger on the Amazon S3 bucket that triggers an AWS Lambda function when a new object is created.
Here is some code from Moving file based on filename with Amazon S3:
import boto3
import urllib
def lambda_handler(event, context):
# Get the bucket and object key from the Event
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])
# Only copy objects that were uploaded to the bucket root (to avoid an infinite loop)
if '/' not in key:
# Determine destination directory based on Key
directory = key # Your logic goes here to extract the directory name
# Copy object
s3_client = boto3.client('s3')
s3_client.copy_object(
Bucket = bucket,
Key = f"{directory}/{key}",
CopySource= {'Bucket': bucket, 'Key': key}
)
# Delete source object
s3_client.delete_object(
Bucket = bucket,
Key = key
)
You would need to modify the code that determines the name of the destination directory based on the key of the new object.
It also assumes that new objects will come into the top-level (root) of the bucket and then be moved into sub-directories. If, instead, new objects are coming in a given path (eg incoming/) then only set the S3 trigger to operate on that path and remove the if '/' not in key logic.

Generate presigned s3 URL of latest object in the bucket using boto3

I have a s3 bucket with multiple folders. How can I generate s3 presigned URL for a latest object using python boto3 in aws for each folder asked by a user?
You can do something like
import boto3
from botocore.client import Config
import requests
bucket = 'bucket-name'
folder = '/' #you can add folder path here don't forget '/' at last
s3 = boto3.client('s3',config=Config(signature_version='s3v4'))
objs = s3.list_objects(Bucket=bucket, Prefix=folder)['Contents']
latest = max(objs, key=lambda x: x['LastModified'])
print(latest)
print (" Generating pre-signed url...")
url = s3.generate_presigned_url(
ClientMethod='get_object',
Params={
'Bucket': bucket,
'Key': latest['Key']
}
)
print(url)
response = requests.get(url)
print(response.url)
here it will give the latest last modified file from the whole bucket however you can update login and update prefix value as per need.
if you are using Kubernetes POD, VM, or anything you can pass environment variables or use the python dict to store the latest key if required.
If it's a small bucket then recursively list the bucket, with prefix as needed. Sort the results by timestamp, and create the pre-signed URL for the latest.
If it's a very large bucket, this will be very inefficient and you should consider other ways to store the key of the latest file. For example: trigger a Lambda function whenever an object is uploaded and write that object's key into a LATEST item in DynamoDB (or other persistent store).

How to create a folder in an amazon S3 bucket using terraform

I was able to create a bucket in an amazon S3 using this link.
I used the following code to create a bucket :
resource "aws_s3_bucket" "b" {
bucket = "my_tf_test_bucket"
acl = "private"
}
Now I wanted to create folders inside the bucket, say Folder1.
I found the link for creating an S3 object. But this has a mandatory parameter source. I am not sure what this value have to , since my intent is to create a folder inside the S3 bucket.
For running terraform on Mac or Linux, the following will do what you want
resource "aws_s3_bucket_object" "folder1" {
bucket = "${aws_s3_bucket.b.id}"
acl = "private"
key = "Folder1/"
source = "/dev/null"
}
If you're on windows you can use an empty file.
While folks will be pedantic about s3 not having folders, there are a number of operations where having an object placeholder for a key prefix (otherwise called a folder) make life easier. Like s3 sync for example.
Actually, there is a canonical way to create it, without being OS dependent, by inspecting the Network on a UI put you see the content headers, as stated by : https://stackoverflow.com/users/1554386/alastair-mccormack ,
And S3 does support folders these days as visible from the UI.
So this is how you can achieve it:
resource "aws_s3_bucket_object" "base_folder" {
bucket = "${aws_s3_bucket.default.id}"
acl = "private"
key = "${var.named_folder}/"
content_type = "application/x-directory"
kms_key_id = "key_arn_if_used"
}
Please notice the trailing slash otherwise it creates an empty file
Above has been used with a Windows OS to successfully create a folder using terraform s3_bucket_object.
The answers here are outdated, it's now definitely possible to create an empty folder in S3 via Terraform. Using the aws_s3_object resource, as follows:
resource "aws_s3_bucket" "this_bucket" {
bucket = "demo_bucket"
}
resource "aws_s3_object" "object" {
bucket = aws_s3_bucket.this_bucket.id
key = "demo/directory/"
}
If you don't supply a source for the object then terraform will create an empty directory.
IMPORTANT - Note the trailing slash this will ensure you get a directory and not an empty file
S3 doesn't support folders. Objects can have prefix names with slashes that look like folders, but that's just part of the object name. So there's no way to create a folder in terraform or anything else, because there's no such thing as a folder in S3.
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html
http://docs.aws.amazon.com/AWSImportExport/latest/DG/ManipulatingS3KeyNames.html
If you want to pretend, you could create a zero-byte object in the bucket named "Folder1/" but that's not required. You can just create objects with key names like "Folder1/File1" and it will work.
old answer but if you specify the key with the folder (that doesn't exist yet) terraform will create the folder automatically for you
terraform {
backend "s3" {
bucket = "mysql-staging"
key = "rds-mysql-state/terraform.tfstate"
region = "us-west-2"
encrypt = true
}
}
I would like to add to this discussion that you can create a set of empty folders by providing the resource a set of strings:
resource "aws_s3_object" "default_s3_content" {
for_each = var.default_s3_content
bucket = aws_s3_bucket.bucket.id
key = "${each.value}/"
}
where var.default_s3_content is a set of strings:
variable "default_s3_content" {
description = "The default content of the s3 bucket upon creation of the bucket"
type = set(string)
default = ["folder1", "folder2", "folder3", "folder4", "folder5"]
}
v0.12.8 introduces a new fileset() function which can be used in combination with for_each to support this natively :
NEW FEATURES:
lang/funcs: New fileset function, for finding static local files that
match a glob pattern. (#22523)
A sample usage of this function is as follows (from here):
# Given the file structure from the initial issue:
# my-dir
# |- file_1
# |- dir_a
# | |- file_a_1
# | |- file_a_2
# |- dir_b
# | |- file_b_1
# |- dir_c
# And given the expected behavior of the base_s3_key prefix in the initial issue
resource "aws_s3_bucket_object" "example" {
for_each = fileset(path.module, "my-dir/**/file_*")
bucket = aws_s3_bucket.example.id
key = replace(each.value, "my-dir", "base_s3_key")
source = each.value
}
At the time of this writing, v0.12.8 is a day old (Released on 2019-09-04) so the documentation on https://www.terraform.io/docs/providers/aws/r/s3_bucket_object.html does not yet reference it. I am not certain if that's intentional.
As an aside, if you use the above, remember to update/create version.tf in your project like so:
terraform {
required_version = ">= 0.12.8"
}

Upload subdirectory transferutility S3

I want to upload all the files/folder inside a directory to a S3 bucket. I want to upload including the all the files in all the sub directories. I thought of using TransferUtility to do this. Though the link here says that 'By default, Amazon S3 only uploads the files at the root of the specified directory. You can, however, specify to recursively upload files in all the subdirectories.' but I couldnt find a way to do this. I coundnt find any property where I can mention to include all the sub directories. I tried using SearchOption = System.IO.SearchOption.AllDirectories and SearchPattern = "*" to achieve this, but still it uploaded only the files in the top most directory. Please help me in this. Thanks.
I'm using the below code,
TransferUtility directoryTransferUtility = new TransferUtility(s3Client);
TransferUtilityUploadDirectoryRequest uRequest = new TransferUtilityUploadDirectoryRequest()
{
Directory = dirPath,
BucketName = bucketName,
SearchOption = System.IO.SearchOption.AllDirectories,
SearchPattern = "*"
};
directoryTransferUtility.UploadDirectory(dirPath, bucketName);
This is what worked for me: I set the options on the UploadDirectory method and used "*.*" as the search pattern.
directoryTransferUtility.UploadDirectory(dirPath,
bucketName,
"*.*",
SearchOption.AllDirectories);