How do you set "Content-Type" when saving to S3 using django-storages with S3boto backend? - django

I am using django-storages with s3boto as a backend.
I have one bucket with two folders - one for static and one for media. I achieve this using django-s3-folder-storage.
As well as saving to S3 using a model, I also want to implement an image-resize-and-cache function to save the files to S3. To do this I interact directly with my S3 bucket. The code works, but the Content-Type isn't set on S3.
in iPython:
In [2]: from s3_folder_storage.s3 import DefaultStorage
In [3]: s3media = DefaultStorage()
In [4]: s3media
Out[4]: <s3_folder_storage.s3.DefaultStorage at 0x4788780>
Test we're accessing the right bucket - storage_test is one I created earlier:
In [5]: s3media.exists('storage_test')
Out[5]: True
In [6]: s3media.open("test.txt", "w")
Out[6]: <S3BotoStorageFile: test.txt>
In [7]: test = s3media.open("test.txt", "w")
In [8]: test
Out[8]: <S3BotoStorageFile: test.txt>
In [9]: test.key.content_type = "text/plain"
In [10]: test.write("...")
In [11]: test.close()
In [12]: test = s3media.open("test.txt", "w")
In [13]: test.key.content_type
Out[13]: 'binary/octet-stream'
I've also tried instead of In [9] using test.key.metadata and test.key.set_metadata. None of them do it.
How do I set the correct Content-Type?

If you go through the source code in class S3BotoStorageFile and function write, the header is updated from only 2 places,
upload_headers.update(self._storage.headers) where self._storage.headers is taken from AWS_HEADERS
self._storage.default_acl
And in function _flush_write_buffer only self._storage.headers is considered. Check for the line headers = self._storage.headers.copy()
So updating test.key.content_type will not work.
Instead of test.key.content_type = "text/plain" at In [9]: try using test._storage.headers['Content-Type'] = 'text/plain', it should work.

This is for Boto3 ONLY, not Boto. If you would like to set those headers, you will need to access the object like so, file_ is refereing to a FileField with storage setup to be using Boto3 from django-storages:
file_.storage.object_parameters = { 'ContentType': 'text/plain' }
NOTE: it requires header names to be camelcase, so Content-Type = ContentType, Content-Dispostion = ContentDispostion etc. Hope this helps!

Now you can use django-storages >= 1.4 and it automatically guesses the mime types.

According to this answer, the Content-Type isn't metadata but rather headers that you set up when you upload the file.

I've had a similar issue - I wanted to set my header for all the files uploaded to S3 using django-storages, without relying on the default library approach which is guessing mime type based on the filename.
Please note that you can tweak the way how the header is set and you don't have to have it fixed like I have (my case was specific).
This is what worked for me:
Implement custom file manager:
import os
from storages.backends.s3boto3 import S3Boto3Storage
class ManagedS3BotoS3Storage(S3Boto3Storage):
def _save(self, name, content):
cleaned_name = self._clean_name(name)
name = self._normalize_name(cleaned_name)
params = self._get_write_parameters(name, content)
content_type = "application/octet-stream". # Content-Type that I wanted to have for each file
params["ContentType"] = content_type
encoded_name = self._encode_name(name)
obj = self.bucket.Object(encoded_name)
if self.preload_metadata:
self._entries[encoded_name] = obj
content.seek(0, os.SEEK_SET)
obj.upload_fileobj(content, ExtraArgs=params)
return cleaned_name
Use ManagedS3BotoS3Storage in model:
class SomeCoolModel(models.Model):
file = models.FileField(
storage=ManagedS3BotoS3Storage(bucket="my-awesome-S3-bucket"),
upload_to="my_great_path_to_file",
)
Run python manage.py makemigrations.
That's it, after this all files that I had was uploaded with Content-Type: "application/octet-stream

Related

Creating a dmarc parser using parsedmarc in python3 for use in AWS s3

I am very new to programming. I am working on a pipeline to analyze DMARC report files that are sent to my email account, that I am manually placing in an s3 bucket. The goal of this task is to download, extract, and analyze files using parsedmarc: https://github.com/domainaware/parsedmarc The part I'm having difficulty with is setting a conditional statement to extract .gz files if the target file is not a .zip file. I'm assuming the gzip library will be sufficient for this purpose. Here is the code I have so far. I'm using python3 and the boto3 library for AWS. Any help is appreciated!
import parsedmarc
import pprint
import json
import boto3
import zipfile
import gzip
pp = pprint.PrettyPrinter(indent=2)
def main():
#Set default session profile and region for sandbox account. Access keys are pulled from /.aws/config and /.aws/credentials.
#The 'profile_name' value comes from the header for the account in question in /.aws/config and /.aws/credentials
boto3.setup_default_session(region_name="aws-region-goes-here")
boto3.setup_default_session(profile_name="aws-account-profile-name-goes-here")
#Define the s3 resource, the bucket name, and the file to download. It's hardcoded for now...
s3_resource = boto3.resource(s3)
s3_resource.Bucket('dmarc-parsing').download_file('source-dmarc-report-filename.zip' '/home/user/dmarc/parseme.zip')
#Use the zipfile python library to extract the file into its raw state.
with zipfile.ZipFile('/home/user/dmarc/parseme.zip', 'r') as zip_ref:
zip_ref.extractall('/home/user/dmarc')
#Ingest all locations for xml file source
dmarc_report_directory = '/home/user/dmarc/'
dmarc_report_file = 'parseme.xml'
"""I need an if statement here for extracting .gz files if the file type is not .zip. The contents of every archive are .xml files"""
#Set report output variables using functions in parsedmarc. Variable set to equal the output
pd_report_output=parsedmarc.parse_aggregate_report_file(_input=f"{dmarc_report_directory}{dmarc_report_file}")
#use jsonify to make the output in json format
pd_report_jsonified = json.loads(json.dumps(pd_report_output))
dkim_status = pd_report_jsonified['records'][0]['policy_evaluated']['dkim']
spf_status = pd_report_jsonified['records'][0]['policy_evaluated']['spf']
if dkim_status == 'fail' or spf_status == 'fail':
print(f"{dmarc_report_file} reports failure. oh crap. report:")
else:
print(f"{dmarc_report_file} passes. great. report:")
pp.pprint(pd_report_jsonified['records'][0]['auth_results'])
if __name__ == "__main__":
main()
Here is the code using the parsedmarc.parse_aggregate_report_xml method I found. Hope this helps others in parsing these reports:
import parsedmarc
import pprint
import json
import boto3
import zipfile
import gzip
pp = pprint.PrettyPrinter(indent=2)
def main():
#Set default session profile and region for account. Access keys are pulled from ~/.aws/config and ~/.aws/credentials.
#The 'profile_name' value comes from the header for the account in question in ~/.aws/config and ~/.aws/credentials
boto3.setup_default_session(profile_name="aws_profile_name_goes_here", region_name="region_goes_here")
source_file = 'filename_in_s3_bucket.zip'
destination_directory = '/tmp/'
destination_file = 'compressed_report_file'
#Define the s3 resource, the bucket name, and the file to download. It's hardcoded for now...
s3_resource = boto3.resource('s3')
s3_resource.Bucket('bucket-name-for-dmarc-report-files').download_file(source_file, f"{destination_directory}{destination_file}")
#Extract xml
outputxml = parsedmarc.extract_xml(f"{destination_directory}{destination_file}")
#run parse dmarc analysis & convert output to json
pd_report_output = parsedmarc.parse_aggregate_report_xml(outputxml)
pd_report_jsonified = json.loads(json.dumps(pd_report_output))
#loop through results and find relevant status info and pass fail status
dmarc_report_status = ''
for record in pd_report_jsonified['records']:
if False in record['alignment'].values():
dmarc_report_status = 'Failed'
#************ add logic for interpreting results
#if fail, publish to sns
if dmarc_report_status == 'Failed':
message = "Your dmarc report failed a least one check. Review the log for details"
sns_resource = boto3.resource('sns')
sns_topic = sns_resource.Topic('arn:aws:sns:us-west-2:112896196555:TestDMARC')
sns_publish_response = sns_topic.publish(Message=message)
if __name__ == "__main__":
main()

AWS static site file upload via boto 3 set the right content-types

What are the right content types for the different types of files of a static site hosted at AWS and how to set these in a smart way via boto3?
I use the upload_file method:
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('allecijfers.nl')
bucket.upload_file('C:/Hugo/Sites/allecijfers/public/test/index.html', 'test/index.html', ExtraArgs={'ACL': 'public-read', 'ContentType': 'text/html'})
This works well for the html files. I initially left out the ExtraArgs which results in a file download (probably because the content type is binary?). I found this page that states several content types but I am not sure how to apply it.
E.g. probably the CSS files should be uploaded with 'ContentType': 'text/css'.
But what about the js files, index.xml, etc? And how to do this in a smart way? FYI this is my current script to upload from Windows to AWS, this requires string.replace("\","/") which probably is not the smartest either?
for root, dirs, files in os.walk(local_root + local_dir):
for filename in files:
# construct the full local path
local_path = os.path.join(root, filename).replace("\\","/")
# construct the full S3 path
relative_path = os.path.relpath(local_path, local_root)
s3_path = os.path.join(relative_path).replace("\\","/")
bucket.upload_file(local_path, s3_path, ExtraArgs={'ACL': 'public-read', 'ContentType': 'text/html'})
I uploaded my complete Hugo site from the same source using the AWS CLI to the same S3 bucket and this works perfect without specifying content types, is this also possible via boto 3?
Many thanks in advance for your help!
There is a python built-in library to guess mimetypes.
So you could just lookup each filename first. It works like this:
import mimetypes
print(mimetypes.guess_type('filename.html'))
Result:
('text/html', None)
In your code. I also slightly improved the portability of your code with respect to the windows path. Now it will do the same thing, but be portable to a Unix platform by looking up the platform specific separator (os.path.sep) that will be being used in any paths.
import boto3
import mimetypes
s3 = boto3.resource('s3')
bucket = s3.Bucket('allecijfers.nl')
for root, dirs, files in os.walk(local_root + local_dir):
for filename in files:
# construct the full local path (Not sure why you were converting to a
# unix path when you'd want this correctly as a windows path
local_path = os.path.join(root, filename)
# construct the full S3 path
relative_path = os.path.relpath(local_path, local_root)
s3_path = relative_path.replace(os.path.sep,"/")
# Get content type guess
content_type = mimetypes.guess_type(filename)[0]
bucket.upload_file(
File=local_path,
Bucket=bucket,
Key=s3_path,
ExtraArgs={'ACL': 'public-read', 'ContentType': content_type}
)

'foo.bar.com.s3.amazonaws.com' doesn't match either of '*.s3.amazonaws.com', 's3.amazonaws.com'

I'm using django, and things like imgs I store at s3 (for this I'm using boto), but recently I got this error:
'foo.bar.com.s3.amazonaws.com' doesn't match either of
'*.s3.amazonaws.com', 's3.amazonaws.com'
I'm searching for a possible solution for about two days, but the unique things that is suggested is to change boto's source code, however I can't do this on production.
Edit: Using Django 1.58, Boto 2.38.0
Any help would be appreciated.
Thx in advance.
As has been said, the problem occurs in buckets containing dots on name. Below just an example that prevents this.
import boto
from boto.s3.connection import VHostCallingFormat
c = boto.connect_s3(aws_access_key_id='your-access-key',
is_secure=False,
aws_secret_access_key='your-secret-access',
calling_format=VHostCallingFormat())
b = c.get_bucket(bucket_name='your.bucket.with.dots', validate=True)
print(b)
You can use this monkey patch in your connection.py file (boto/connection.py):
import ssl
_old_match_hostname = ssl.match_hostname
def _new_match_hostname(cert, hostname):
if hostname.endswith('.s3.amazonaws.com'):
pos = hostname.find('.s3.amazonaws.com')
hostname = hostname[:pos].replace('.', '') + hostname[pos:]
return _old_match_hostname(cert, hostname)
ssl.match_hostname = _new_match_hostname
(source)
another solution is in here:
It's an known issue : #2836. It's due to dots in your bucket name.
I had this issue some days before. A user seems to have succeeded to fix this by setting :
AWS_S3_HOST = 's3-eu-central-1.amazonaws.com'
AWS_S3_CALLING_FORMAT = 'boto.s3.connection.OrdinaryCallingFormat
But it didn't work for me.
Otherwise, you can create a bucket without points (E.g : foo-bar-com). It will work. This is what I did to temporarily fix this issue.
I had the same error when I moved from python3.4 to python3.5 (boto version remained the same).
The problem was solved by moving from boto to boto3 package.
Example:
import boto3
s3_client = boto3.client("s3", aws_access_key_id=aws_access_key, aws_secret_access_key=aws_secret_key)
body = open(local_filename, 'rb').read()
resp = s3_client.put_object(ACL="private", Body=body, Bucket=your_bucket_name, Key=path_to_the_file_inside_the_bucket)
if resp["ResponseMetadata"]["HTTPStatusCode"] != 200:
raise Exception("Something went wrong...")

Issue with uploading files from local directory to aws S3 using python 2.7 and boto 2

I’m doing simple operation to of downloading the gzip files from S3 bucket to the local directory. I’m extracting those into another local directory and then uploading them back to S3 bucket again into archive folder path. While doing this operation I want to make sure I am processing same set of files that I initially download from S3 bucket which is (f_name) in below code. Now, below code is not uploading those back to S3 , that’s where I’m stuck. But able to download from S3 and extract it into local directory. Can you please help me understand what is wrong with the _uploadFile function?
from boto.s3.connection import S3Connection
from boto.s3.key import *
import os
import os.path
aws_bucket= "event-logs-dev” ## S3 Bucket name
local_download_directory= "/Users/TargetData/Download/test_queue1/“ ## local directory to download the gzip files from S3.
Target_directory_to_extract = "/Users/TargetData/unzip” ##local directory to gunzip the downloaded files.
Target_s3_path_to_upload= "event-logs-dev/data/clean/xact/logs/archive/“ ## S3 bucket path to upload the files.
def decompressAllFilesFromNetfiler(self,aws_bucket,local_download_directory,Target_d irectory_to_extract,Target_s3_path_to_upload):
zipFiles = [f for f in os.listdir(local_download_directory) if re.match(r'.*\.tar\.gz', f)]
for f_name in zipFiles:
if os.path.exists(Target_directory_to_extract+"/"+f_name[:-len('.tar.gz')]) and os.access(Target_directory_to_extract+"/"+f_name[:-len('.tar.gz')], os.R_OK):
print ('File {} already exists!'.format(f_name))
else:
f_name_with_path = os.path.join(local_download_directory, f_name)
os.system('mkdir -p {} && tar vxzf {} -C {}'.format(Target_directory_to_extract, f_name_with_path, Target_directory_to_extract))
print ('Extracted file {}'.format(f_name))
self._uploadFile(aws_bucket,f_name,Target_s3_path_to_upload,Target_directory_to_extract)
def _uploadFile(self, aws_bucket, f_name,Target_s3_path_to_upload,Target_directory_to_extract):
full_key_name = os.path.expanduser(os.path.join(Target_s3_path_to_upload, f_name))
path = os.path.expanduser(os.path.join(Target_directory_to_extract, f_name))
try:
print "Uploaded extracted file to: %s" % (full_key_name)
key = aws_bucket.new_key(full_key_name)
key.set_contents_from_filename(path)
except:
if full_key_name is None:
print "Error uploading”
Currently, the output prints that Uploaded extracted file to: event-logs-dev/data/clean/xact/logs/archive/1442235602129200000.tar.gz, but nothing is uploaded to S3 bucket. Your help is greatly appreciated!! Thank you in advance!
It appears that you have cut and pasted parts of your code - and maybe formatting was lost as your code above will not work as pasted. I've taken the liberty to make it PEP8 (mostly) however there is still some missing code to create the S3 objects. Since your import the modules, I presume that you have that section of code and just didn't paste it.
here is a cleaned up version of your code formatted correctly. I also added a Exception code to your try: block to print out the error you get. You should update the Exception to be more specific to the Exceptions thrown for make_key or set_contents_... but the general Exception will get you started. If nothing more this is more readable, but you should include your S3 connection code too - and remove anything that is specific to your domain (e.g. keys, trade secrets, etc).
#!/usr/bin/env python
"""
do some download
some extract
and some upload
"""
from boto.s3.connection import S3Connection
from boto.s3.key import *
import os
import os.path
aws_bucket = 'event-logs-dev'
local_download_directory = '/Users/TargetData/Download/test_queue1/'
Target_directory_to_extract = '/Users/TargetData/unzip'
Target_s3_path_to_upload = 'event-logs-dev/data/clean/xact/logs/archive/'
'''
MUST BE SOME MAGIC HERE TO GET AN S3 CONNECTION ???
aws_bucket IS NOT A BUCKET OBJECT ...
'''
def decompressAllFilesFromNetfiler(self,
aws_bucket,
local_download_directory,
Target_directory_to_extract,
Target_s3_path_to_upload):
'''
decompress stuff
'''
zipFiles = [f for f in os.listdir(
local_download_directory) if re.match(r'.*\.tar\.gz', f)]
for f_name in zipFiles:
if os.path.exists(
"{}/{}".format(Target_directory_to_extract,
f_name[:len('.tar.gz')])) and os.access(
"{}/{}".format(Target_directory_to_extract,
f_name[:len('.tar.gz')])) and os.R_OK:
print ('File {} already exists!'.format(f_name))
else:
f_name_with_path = os.path.join(local_download_directory, f_name)
os.system('mkdir -p {} && tar vxzf {} -C {}'.format(
Target_directory_to_extract,
f_name_with_path,
Target_directory_to_extract))
print ('Extracted file {}'.format(f_name))
self._uploadFile(aws_bucket,
f_name,
Target_s3_path_to_upload,
Target_directory_to_extract)
def _uploadFile(self,
aws_bucket,
f_name,
Target_s3_path_to_upload,
Target_directory_to_extract):
full_key_name = os.path.expanduser(os.path.join(Target_s3_path_to_upload,
f_name))
path = os.path.expanduser(os.path.join(Target_directory_to_extract, f_name))
try:
S3CONN = S3Connection()
BUCKET = S3CONN.get_bucket(aws_bucket)
key = BUCKET.new_key(full_key_name)
key.set_contents_from_filename(path)
print "Uploaded extracted file to: {}".format(full_key_name)
except Exception as UploadERR:
if full_key_name is None:
print 'Error uploading'
else:
print "Error : {}".format(UploadERR)

django boto3: NoCredentialsError -- Unable to locate credentials

I am trying to use boto3 in my django project to upload files to Amazon S3. Credentials are defined in settings.py:
AWS_ACCESS_KEY = xxxxxxxx
AWS_SECRET_KEY = xxxxxxxx
S3_BUCKET = xxxxxxx
In views.py:
import boto3
s3 = boto3.client('s3')
path = os.path.dirname(os.path.realpath(__file__))
s3.upload_file(path+'/myphoto.png', S3_BUCKET, 'myphoto.png')
The system complains about Unable to locate credentials. I have two questions:
(a) It seems that I am supposed to create a credential file ~/.aws/credentials. But in a django project, where do I have to put it?
(b) The s3 method upload_file takes a file path/name as its first argument. Is it possible that I provide a file stream obtained by a form input element <input type="file" name="fileToUpload">?
This is what I use for a direct upload, i hope it provides some assistance.
import boto
from boto.exception import S3CreateError
from boto.s3.connection import S3Connection
conn = S3Connection(settings.AWS_ACCESS_KEY,
settings.AWS_SECRET_KEY,
is_secure=True)
try:
bucket = conn.create_bucket(settings.S3_BUCKET)
except S3CreateError as e:
bucket = conn.get_bucket(settings.S3_BUCKET)
k = boto.s3.key.Key(bucket)
k.key = filename
k.set_contents_from_filename(filepath)
Not sure about (a) but django is very flexible with file management.
Regarding (b) you can also sign the upload and do it directly from the client to reduce bandwidth usage, its quite sneaky and secure too. You need to use some JavaScript to manage the upload. If you want details I can include them here.