Trouble setting cache-cotrol header for Amazon S3 key using boto - django

My Django project uses django_compressor to store JavaScript and CSS files in an S3 bucket via boto via the django-storages package.
The django-storages-related config includes
if 'AWS_STORAGE_BUCKET_NAME' in os.environ:
AWS_STORAGE_BUCKET_NAME = os.environ['AWS_STORAGE_BUCKET_NAME']
AWS_HEADERS = {
'Cache-Control': 'max-age=100000',
'x-amz-acl': 'public-read',
}
AWS_QUERYSTRING_AUTH = False
# This causes images to be stored in Amazon S3
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
# This causes CSS and other static files to be served from S3 as well.
STATICFILES_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
STATIC_ROOT = ''
STATIC_URL = 'https://{0}.s3.amazonaws.com/'.format(AWS_STORAGE_BUCKET_NAME)
# This causes conpressed CSS and JavaScript to also go in S3
COMPRESS_STORAGE = STATICFILES_STORAGE
COMPRESS_URL = STATIC_URL
This works except that when I visit the objects in the S3 management console I see the equals sign in the Cache-Control header has been changed to %3D, as in max-age%3D100000, and this stops caching from working.
I wrote a little script to try to fix this along these lines:
max_age = 30000000
cache_control = 'public, max-age={}'.format(max_age)
con = S3Connection(settings.AWS_ACCESS_KEY_ID, settings.AWS_SECRET_ACCESS_KEY)
bucket = con.get_bucket(settings.AWS_STORAGE_BUCKET_NAME)
for key in bucket.list():
key.set_metadata('Cache-Control', cache_control)
but this does not change the metadata as displayed in Amazon S3 management console.
(Update. The documentation for S3 metadata says
After you upload the object, you cannot modify object metadata. The only way to modify object metadata is to make copy of the object and set the metadata. For more information, go to PUT Object - Copy in the Amazon Simple Storage Service API Reference. You can use the Amazon S3 management console to update the object metadata but internally it makes an object copy replacing the existing object to set the metadata.
so perhaps it is not so surprising that I can’t set the metadata. I assume get_metadata is only used when creating the data in the first place.
end update)
So my questions are, first, can I configure django-storages so that it creates the cache-control header correctly in the first place, and second, is the metadata set with set_metadata the same as the metadata viewed with S3 management console and if not what is the latter and how do I set it programatically?

Use ASCII string as values solves this for me.
AWS_HEADERS = {'Cache-Control': str('public, max-age=15552000')}

If you want to add cache control while uploading the file....
headers = {
'Cache-Control':'max-age=604800', # 60 x 60 x 24 x 7 = 1 week
'Content-Type':content_type,
}
k = Key(self.get_bucket())
k.key = filename
k.set_contents_from_string(contents.getvalue(), headers)
if self.public: k.make_public()
If you want to add cache control to existing files...
for key in bucket.list():
print key.name.encode('utf-8')
metadata = key.metadata
metadata['Cache-Control'] = 'max-age=604800' # 60 x 60 x 24 x 7 = 1 week
key.copy(AWS_BUCKET, key, metadata=metadata, preserve_acl=True)
This is tested in boto 2.32 - 2.40.

A Note for future visitors: Use AWS_S3_OBJECT_PARAMETERS instead of AWS_HEADERS with boto3.
Also, CacheControl instead of Cache-Control.
So finally it will be,
AWS_S3_OBJECT_PARAMETERS = {'CacheControl' : str('max-age=525960')} #one year
Source: https://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html

cache_control is a property of key, not part of metadata.
So to set cache-control for all the objects in a bucket, you can do this:
s3_conn = S3Connection(AWS_KEY, AWS_SECRET)
bucket = s3_conn.get_bucket(AWS_BUCKET_NAME)
bucket.make_public()
for key in bucket.list():
key = bucket.get_key(key.name)
key.cache_control = 'max-age=%d, public' % (3600 * 24 * 360 * 2)
print key.name + ' ' + key.cache_control

Related

Data loss in AWS Lambda function when downloading file from S3 to /tmp

I have written a Lambda function in AWS to download a file from an S3 location to /tmp directory (local Lambda space).
I am able to download the file however, the file size is changing here, not sure why?
def data_processor(event, context):
print("EVENT:: ", event)
bucket_name = 'asr-collection'
fileKey = 'cc_continuous/testing/1645136763813.wav'
path = '/tmp'
output_path = os.path.join(path, 'mydir')
if not os.path.exists(output_path):
os.makedirs(output_path)
s3 = boto3.client("s3")
new_file_name = output_path + '/' + os.path.basename(fileKey)
s3.download_file(
Bucket=bucket_name, Key=fileKey, Filename=output_path + '/' + os.path.basename(fileKey)
)
print('File size is: ' + str(os.path.getsize(new_file_name)))
return None
Output:
File size is: 337964
Actual size: 230MB
downloaded file size is 330KB
I tried download_fileobj() as well
Any idea how can i download the file as it is, without any data loss?
The issue can be that the bucket you are downloading from was from a different region than the Lambda was hosted in. Apparently, this does not make a difference when running it locally.
Check your bucket locations relative to your Lambda region.
Make a note that setting the region on your client will allow you to use a lambda in a different region from your bucket. However if you intend to pull down larger files you will get network latency benefits from keeping your lambda in the same region as your bucket.
Working with S3 resource instance instead of client fixed it.
s3 = boto3.resource('s3')
keys = ['TestFolder1/testing/1651219413148.wav']
for KEY in keys:
local_file_name = '/tmp/'+KEY
s3.Bucket(bucket_name).download_file(KEY, local_file_name)

Django Storage and Boto3 not retrieving Media from AWS S3

I am using a development server to test uploading and retrieving static files from AWS S3 using Django storages and Boto3. The file upload worked but I cannot retrieve the files.
This is what I get:
And when I check out the URL in another tab I get this
**This XML file does not appear to have any style information associated with it. The document tree is shown below.**
<Error>
<Code>IllegalLocationConstraintException</Code>
<Message>The me-south-1 location constraint is incompatible for the region specific endpoint this request was sent to.</Message>
<RequestId></RequestId>
<HostId></HostId>
</Error>
Also I configured the settings.py with my own credentials and IAM user
AWS_ACCESS_KEY_ID = <key>
AWS_SECRET_ACCESS_KEY = <secret-key>
AWS_STORAGE_BUCKET_NAME = <bucket-name>
AWS_DEFAULT_ACL = None
AWS_S3_FILE_OVERWRITE = False
AWS_S3_REGION_NAME = 'me-south-1'
AWS_S3_USE_SSL = True
AWS_S3_VERIFY = False
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto3.S3Boto3Storage'
Please check in your AWS Identity & Access Management Console (IAM) whether your access keys have proper S3 permissions assigned to them.
Also, make sure you have installed AWS CLI and setup your credentials in your machine.
You can try running the below command and verify it.
$ aws s3 ls
2018-12-11 17:08:50 my-bucket
2018-12-14 14:55:44 my-bucket2
Reference : https://docs.aws.amazon.com/cli/latest/userguide/cli-services-s3-commands.html

How to delete images with suffix from folder in S3 bucket

I have stored multiple sizes of the image on s3.
e.g. image100_100,image200_200,image300_150;
I want to delete the specific size of images like images with suffix 200_200 from the folder. there are a lot of images in this folder so how to delete these images?
Use AWS command-line interface (AWS CLI):
aws s3 rm s3://Path/To/Dir/ --recursive --exclude "*" --include "*200_200"
We first exclude everything, then include what we need to delete. This is a workaround to mimic the behavior of rm -r "*200_200" command in Linux.
The easiest method would be to write a Python script, similar to:
import boto3
BUCKET = 'my-bucket'
PREFIX = '' # eg 'images/'
s3_client = boto3.client('s3', region_name='ap-southeast-2')
# Get a list of objects
list_response = s3_client.list_objects_v2(Bucket = BUCKET, Prefix = PREFIX)
while True:
# Find desired objects to delete
objects = [{'Key':object['Key']} for object in list_response['Contents'] if object['Key'].endswith('200_200')]
print ('Deleting:', objects)
# Delete objects
if len(objects) > 0:
delete_response = s3_client.delete_objects(
Bucket=BUCKET,
Delete={'Objects': objects}
)
# Next page
if list_response['IsTruncated']:
list_response = s3_client.list_objects_v2(
Bucket = BUCKET,
Prefix = PREFIX,
ContinuationToken=list_reponse['NextContinuationToken'])
else:
break

AWS static site file upload via boto 3 set the right content-types

What are the right content types for the different types of files of a static site hosted at AWS and how to set these in a smart way via boto3?
I use the upload_file method:
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('allecijfers.nl')
bucket.upload_file('C:/Hugo/Sites/allecijfers/public/test/index.html', 'test/index.html', ExtraArgs={'ACL': 'public-read', 'ContentType': 'text/html'})
This works well for the html files. I initially left out the ExtraArgs which results in a file download (probably because the content type is binary?). I found this page that states several content types but I am not sure how to apply it.
E.g. probably the CSS files should be uploaded with 'ContentType': 'text/css'.
But what about the js files, index.xml, etc? And how to do this in a smart way? FYI this is my current script to upload from Windows to AWS, this requires string.replace("\","/") which probably is not the smartest either?
for root, dirs, files in os.walk(local_root + local_dir):
for filename in files:
# construct the full local path
local_path = os.path.join(root, filename).replace("\\","/")
# construct the full S3 path
relative_path = os.path.relpath(local_path, local_root)
s3_path = os.path.join(relative_path).replace("\\","/")
bucket.upload_file(local_path, s3_path, ExtraArgs={'ACL': 'public-read', 'ContentType': 'text/html'})
I uploaded my complete Hugo site from the same source using the AWS CLI to the same S3 bucket and this works perfect without specifying content types, is this also possible via boto 3?
Many thanks in advance for your help!
There is a python built-in library to guess mimetypes.
So you could just lookup each filename first. It works like this:
import mimetypes
print(mimetypes.guess_type('filename.html'))
Result:
('text/html', None)
In your code. I also slightly improved the portability of your code with respect to the windows path. Now it will do the same thing, but be portable to a Unix platform by looking up the platform specific separator (os.path.sep) that will be being used in any paths.
import boto3
import mimetypes
s3 = boto3.resource('s3')
bucket = s3.Bucket('allecijfers.nl')
for root, dirs, files in os.walk(local_root + local_dir):
for filename in files:
# construct the full local path (Not sure why you were converting to a
# unix path when you'd want this correctly as a windows path
local_path = os.path.join(root, filename)
# construct the full S3 path
relative_path = os.path.relpath(local_path, local_root)
s3_path = relative_path.replace(os.path.sep,"/")
# Get content type guess
content_type = mimetypes.guess_type(filename)[0]
bucket.upload_file(
File=local_path,
Bucket=bucket,
Key=s3_path,
ExtraArgs={'ACL': 'public-read', 'ContentType': content_type}
)

S3ResponseError: 301 Moved Permanently - Django

I have a very big issue with Amazon S3. I am working on a Django app and I want to store file on S3:
My settings are:
AWS_STORAGE_BUCKET_NAME = 'tfjm2-inscriptions'
AWS_ACCESS_KEY_ID = 'id'
AWS_SECRET_ACCESS_KEY = 'key'
AWS_S3_CUSTOM_DOMAIN = '%s.s3-eu-west-1.amazonaws.com' % AWS_STORAGE_BUCKET_NAME
And I get this error: S3ResponseError: 301 Moved Permanently
Some same issues on the Internet say that it is because it is a non-US bucket and I did tried with a US-standard bucket but it get a 401 Forbidden error.....
I do not know what to do.
Please help me.
Thank you
You can do this:
AWS_S3_CUSTOM_DOMAIN = '%s.s3.amazonaws.com' % AWS_STORAGE_BUCKET_NAME
in terminal run 'nano ~/.boto'
if there is some configs try to comment or rename file and connect again. (it helps me)
http://boto.cloudhackers.com/en/latest/boto_config_tut.html
there is boto config file directories. take a look one by one and clean them all, it will work by default configs. also configs may be in .bash_profile, .bash_source...
I guess you must allow only KEY-SECRET
Solve by changing your code to:
AWS_STORAGE_BUCKET_NAME = 'tfjm2-inscriptions'
AWS_ACCESS_KEY_ID = 'id'
AWS_SECRET_ACCESS_KEY = 'key'
AWS_S3_CUSTOM_DOMAIN = '%s.s3.amazonaws.com' % AWS_STORAGE_BUCKET_NAME
AWS_S3_REGION_NAME = 'us-east-2' ##### Use the region name where your bucket is created
AWS allow you to access a create and access buckets in the same region as an optimization measure and as such if you create a bucket in say 'us-west-2' you'll get a 301 if try accessing it from a different region (Africa, Europe and even East US).
You should specify the region if requesting the bucket from outside its region.