Viewing s3 files in browser - django

I have created a bucket in s3 and successfully uploaded files to it with django storages. However, when I try to access the files in the browser, I get the following error:
IllegalLocationConstraintException
The eu-south-1 location constraint is incompatible for the region specific endpoint
this request was sent to.
I have also realised I do not have region name included in my URL(https://docs.s3.amazonaws.com/media/admin/2.pdf...).
Could that be the problem?
If so, how do I set it to append the region name?
What could be missing here?

TL;DR
Set AWS_S3_ENDPOINT_URL to https://s3.your-region-name.amazonaws.com in settings.py. If you need to specify an alternate region in an override of S3Boto3Storage for a particular field, set the endpoint_url attribute to s3.your-alternate-region-name.amazonaws.com.
Explanation
I finally figured this out after an embarrassing number of hours. According to this comment on a boto3 repo issue if a region was launched after 20 March 2019 (which both eu-south-1 and af-south-1, the region I am using, were), then basically s3 requests are routed differently. Read the comment, but in order to fix this you need to specify which region the request is going to like so:
This URL style works for all regions but the ones launched after 20 March 2019: bucket-name.s3.amazonaws.com/file_key.txt. Don't use this one.
For the regions launched post 20 March 2019 the URL needs to include the region name between the .s3 and .amazonaws.com parts, like so:bucket-name.s3.your-region-name.amazonaws.com/file_key.txt. Note that this style is backwards-compatible and works with all s3 regions. Use this one.
This means that we need to explicitly set the endpoint_url on all of these regions. We need to keep in mind that the addressing_style attribute for django-storages is set to None, meaning that boto3 will use the value path for this attribute. This means that if we set our endpoint_url to bucket-name.s3.your-region-name.amazonaws.com on S3Boto3Storage override classes (like below) boto3 will prepend the bucket_name attribute to every s3 key. What you will end up with is something like bucket-name.s3.your-region-name.amazonaws.com/bucket-name/file_key.txt when we obviously only want bucket-name.s3.your-region-name.amazonaws.com/file_key.txt. This is not documented in django-storages.
class IncorrectStorageSetup(S3Boto3Storage):
bucket_name = "bucket-name"
endpoint_url = "bucket-name.s3.your-region-name.amazonaws.com"
# addressing_style = None -> defaults to path type `addressing_style`.
Here is how to fix this.
Use path addressing style by leaving AWS_S3_ADDRESSING_STYLE to its default of None in settings.py and set AWS_S3_ENDPOINT_URL to s3.your-region-name.amazonaws.com. This means that all URLs will take the correct form of s3.your-region-name.amazonaws.com/bucket-name/file_key.txt. Now, every time you override S3Boto3Storage you only need to set the bucket_name attribute if the bucket is in your-region-name that you set in AWS_S3_ENDPOINT_URL above, that is. If you want to use another region, explicitly set the endpoint_url attribute of the class to s3.your-other-region-name.amazonaws.com as well as the bucket_name attribute.
Note there is another way to fix this by using the converse, i.e. setting AWS_S3_ADDRESSING_STYLE to virtual with everything else configured the same. It should do the same thing but you need to explicitly set AWS_S3_ADDRESSING_STYLE which is one more step than above.

Related

AWS S3 file with same name does not get overwrite but gets characters added at the end of filename

Below is an example for my scenario,
I have a Django API which allows user to upload images to a certain directory, the images will be stored in an S3 bucket. Let's say the file name is 'example.jpeg'
User again uploads image with the same name 'example.jpeg' to the same directory.
Both of them correctly show up in the same directory but the second one gets additional characters at the end of the filename like this 'example_785PmrM.jpeg'. I suspect the additional characters are added by s3 but my research says s3 will overwrite the file with same name.
How can I enable the overwrite feature, I haven't seen any option for this.
Thanks
S3 itself does not change a key on it's own. The only option I see that can be impacting this is Django's storage backend for S3:
AWS_S3_FILE_OVERWRITE (optional: default is True)
By default files with the same name will overwrite each other. Set this to False to have extra characters appended.
So you should set AWS_S3_FILE_OVERWRITE to True to prevent this behavior.
Depending on your exact needs, consider enabling S3 versioning so you can access previous versions of a objects as they're overwritten in S3 in the future.

S3/Athena query result location and “Invalid S3 folder location”

Are there particular requirements to the bucket for specifying the query result location? When I try to create a new table, I get a popup:
Before you run your first query, you need to set up a query result location in Amazon S3. Learn more
So I click the link and specify my query result location in the format specified s3://query-results-bucket/folder. But it always says
Invalid S3 folder location
I posted this in Superuser first but it was closed (not sure why...).
The folder name needs to have a trailing slash:
s3://query-results-bucket/folder/
Ran into this earlier in the week.
First, make sure the bucket exists. There doesn't appear to be an option to create the bucket when setting the value in the athena console.
Next, make sure you have the bucket specified properly. In my case, I initially had s3:/// - there is no validation, so an extra character will cause this error. If you go to the athena settings, you can see what the bucket settings look like.
Finally check the workgroup - there is a default workgroup per account, make sure it's not disabled. You can create additional workgroups, each of which will need its own settings.

S3 versioning with django-storages

I use django-storages in my app, and I want to use S3 versioning just out of the box. I mean, I want to store and retrieve different versions of the same file without implementing any additional mechanism.
By reading the docs, I understand that:
Retrieving an object is as easy as adding a version=XX parameter to the GET request.
Uploading an object is handled by S3 itself. You just need to configure versioning in your bucket
But, going to the code. If I have this:
from django.db import models
from django_s3_storage.storage import S3Storage
storage = S3Storage(aws_s3_bucket_name='test_bucket')
class Document(models.Model):
name = models.CharField(max_length=255)
s3_file = models.FileField(storage=storage)
How can I get one specific version of a Document? Something like:
doc = Document.objects.get(pk=XX)
doc.s3_file.read(version=XXXXXXXXXX) # Something like this?
I've been reading the official documentation, but can't find how to:
Get available versions of an object
Retrieve one specific version
EDIT: Reading the source code, I understand I could use parameters in url() call, but not sure about what parameter (version?) and how to get the existent version of an object.
Any help is appreciated.
Ok, the comment of #dirkgroten was pretty accurate. So:
get-object using version-id to get a specific version of an object
list-object-versions to get all available versions of an object

Specify Maximum File Size while uploading a file in AWS S3

I am creating temporary credentials via AWS Security Token Service (AWS STS).
And Using these credentials to upload a file to S3 from S3 JAVA SDK.
I need some way to restrict the size of file upload.
I was trying to add policy(of s3:content-length-range) while creating a user, but that doesn't seem to work.
Is there any other way to specify the maximum file size which user can upload??
An alternative method would be to generate a pre-signed URL instead of temporary credentials. It will be good for one file with a name you specify. You can also force a content length range when you generate the URL. Your user will get URL and will have to use a specific method (POST/PUT/etc.) for the request. They set the content while you set everything else.
I'm not sure how to do that with Java (it doesn't seem to have support for conditions), but it's simple with Python and boto3:
import boto3
# Get the service client
s3 = boto3.client('s3')
# Make sure everything posted is publicly readable
fields = {"acl": "private"}
# Ensure that the ACL isn't changed and restrict the user to a length
# between 10 and 100.
conditions = [
{"acl": "private"},
["content-length-range", 10, 100]
]
# Generate the POST attributes
post = s3.generate_presigned_post(
Bucket='bucket-name',
Key='key-name',
Fields=fields,
Conditions=conditions
)
When testing this make sure every single header item matches or you'd get vague access denied errors. It can take a while to match it completely.
I believe there is no way to limit the object size before uploading, and reacting to that would be quite hard. A workaround would be to create an S3 event notification that triggers your code, through a Lambda funcation or SNS topic. That could validate or delete the object and notify the user for example.

AWS CloudFront Behavior

I've been setting up aws lambda functions for S3 events. I want to set up a new structure for my bucket, but it's not possible--so I set up a new bucket the way I want and will migrate old things and send new things there. I wanted to have some of the structure the same under a given base folder name old-bucket/images and new-bucket/images. I set up CloudFront to serve from old-bucket/images now, but I wanted to add new-bucket/images as well. I thought the behavior tab would set it such that it would check the new-bucket/images first then old-bucket/images. Alas, that didn't work. If the object wasn't found in the first, that was the end of the line.
Am I misunderstanding how behaviors work? Has anyone attempted anything like this?
That is expected behavior. An origin tells Amazon CloudFront where to obtain the data to serve to users, based upon a prefix, suffix, etc.
For example, you could serve old-bucket/* from one Amazon S3 bucket, while serving new-bucket/* from a different bucket.
However, there is no capability to 'fall-back' to a different origin if a file is not found.
You could check for the existence of files before serving the link, and then provide a different link depending upon where the files are stored. Otherwise, you'll need to put all of your files in the location that matches the link you are serving.