django AWS S3 download optimisation - django

I work on some django + DRF project, that has application for media files upload to and download from AWS S3 (application is something like proxy, using to not access frontend to download images directly from AWS S3). I noticed, that my download requests work not so fast (500ms +/- 100ms), and in production it probably will be a problem, so my question is following: "Is there a way to make these requests faster or separate download logic to some async microservice or multiprocess task? What is the best practice?"
Service, that download images for me in current project state (for context):
# media_app/services/download/image.py
class ImageDownloadService(FileDownloadServiceBase):
model = Image
# media_app/services/download/base.py
class FileDownloadServiceBase:
model = ...
def __init__(self, instance: str) -> None:
self.instance = instance
def _get_file(self, presigned_url):
response = requests.get(url=presigned_url, stream=True)
return response
def download(self) -> Tuple[file_data, status_code]:
s3 = boto3.resource(
service_name='s3',
aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY,
region_name=settings.AWS_S3_REGION_NAME,
)
url = s3.meta.client.generate_presigned_url(
ClientMethod="get_object", ExpiresIn=3600,
Params={
"Bucket": settings.AWS_STORAGE_BUCKET_NAME,
"Key": f'media/public/{self.instance.file_name}',
},
)
response = self._get_file(url)
return response.content, response.status_code

Related

Share JPEG file stored on S3 via URL instead of downloading

I have recently completed this tutorial from AWS on how to create a thumbnail generator using lambda and S3: https://docs.aws.amazon.com/lambda/latest/dg/with-s3-tutorial.html . Basically, I'm uploading an image file to my '-source' bucket and then lambda generates a thumbnail and uploads it to my '-thumbnail' bucket.
Everything works as expected. However, I wanted to use s3 object URL in the '-thumbnail' bucket so that I can load the image from there for a small app I'm building. The issue I'm having is that the URL doesn't display the image in the browser but instead downloads the file. This causes my app to error out.
I did some research and learned that I had to change the content-type to image/jpeg and then also made the object public using ACL. This works for all of the other buckets I have except the one that has the thumbnail. I have recreated this bucket several times. I even copied the settings from my existing buckets. I have compared settings to all the other buckets and they appear to be the same.
I wanted to reach out and see if anyone has ran into this type of issue before. Or if there is something I might be missing.
Here is the code I'm using to generate the thumbnail.
import boto3
from boto3.dynamodb.conditions import Key, Attr
import os
import sys
import uuid
import urllib.parse
from urllib.parse import unquote_plus
from PIL.Image import core as _imaging
import PIL.Image
s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(os.environ['DB_TABLE_NAME'])
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
recordId = key
tmpkey = key.replace('/', '')
download_path = '/tmp/{}{}'.format(uuid.uuid4(), tmpkey)
upload_path = '/tmp/resized-{}'.format(tmpkey)
try:
s3.download_file(bucket, key, download_path)
resize_image(download_path, upload_path)
bucket = bucket.replace('source', 'thumbnail')
s3.upload_file(upload_path, bucket, key)
print(f"Thumbnail created and uploaded to {bucket} successfully.")
except Exception as e:
print(e)
print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
raise e
else:
s3.put_object_acl(ACL='public-read',
Bucket=bucket,
Key=key)
#create image url to add to dynamo
url = f"https://postreader-thumbnail.s3.us-west-2.amazonaws.com/{key}"
print(url)
#create record id to update the appropriate record in the 'Posts' table
recordId = key.replace('.jpeg', '')
#add the image_url column along with the image url as the value
table.update_item(
Key={'id':recordId},
UpdateExpression=
"SET #statusAtt = :statusValue, #img_urlAtt = :img_urlValue",
ExpressionAttributeValues=
{':statusValue': 'UPDATED', ':img_urlValue': url},
ExpressionAttributeNames=
{'#statusAtt': 'status', '#img_urlAtt': 'img_url'},
)
def resize_image(image_path, resized_path):
with PIL.Image.open(image_path) as image:
#change to standard/hard-coded size
image.thumbnail(tuple(x / 2 for x in image.size))
image.save(resized_path)
This could happen if the Content-Type of the file you're uploading is binary/octet-stream , you can modify your script like below to provide custom content-type while uploading.
s3.upload_file(upload_path, bucket, key, ExtraArgs={'ContentType':
"image/jpeg"})
After more troubleshooting the issue was apparently related to the bucket's name. I created a new bucket with a different name than it had previously. After doing so I was able to upload and share images without issue.
I edited my code so that the lambda uploads to the new bucket name and I am able to share the image via URL without downloading.

How to store Google Auth Credentials object in Django?

I'm trying integrate Google Tag Manager into my project. In official document, Google suggests oauth2client library. Unfortunately, this library deprecated. I used google_auth_oauthlib. I can get token and send request to Google Tag Manager API with this token. But I don't decide how can I store credentials object in Django. In oauth2client lib, there is CredentialsField for model. We can store Credentials object in this field with using DjangoORMStorage. But I can't use this deprecated library. Are there any alternative ways?
Google Tag Manager document is here
My codes here:
from google_auth_oauthlib.flow import Flow
from googleapiclient.discovery import build
FLOW = Flow.from_client_secrets_file(
settings.GOOGLE_OAUTH2_CLIENT_SECRETS_JSON,
scopes=[settings.GOOGLE_OAUTH2_SCOPE])
class GTMAuthenticateView(APIView):
def get(self,request,**kwargs):
FLOW.redirect_uri = settings.GOOGLE_OAUTH2_REDIRECT_URI
authorization_url, state = FLOW.authorization_url(access_type='offline',
include_granted_scopes='true')
return Response({'authorization_url':authorization_url,'state':state})
def post(self, request, **kwargs):
FLOW.fetch_token(code=request.data['code'])
credentials = FLOW.credentials
return Response({'token':credentials.token})
class GTMContainerView(APIView):
def get(self,request,**kwargs):
service = get_service()
containers = self.get_containers(service)
return Response({'containers':str(containers),'credentials':str(FLOW.credentials)})
#staticmethod
def get_containers(service):
account_list = service.accounts().list().execute()
container_list = []
for account in account_list['account']:
containers = service.accounts().containers().list(parent=account["path"]).execute()
for container in containers["container"]:
container["usageContext"] = container["usageContext"][0].replace("['", "").replace("']", "")
container_list.append(container)
return container_list
def get_service():
try:
credentials = FLOW.credentials
service = build('tagmanager', 'v2', credentials=credentials)
return service
except Exception as ex:
print(ex)
Just switch over to Django 2
pip install Django==2.2.12
I did it and works fine

Google Cloud Storage giving ServiceUnavailable: 503 exception Backend Error

I'm trying to upload a file to Google Cloud Storage Bucket. While making it public, intermittently I'm getting this exception from Google. This error comes almost once in 20 uploads.
google.api_core.exceptions.ServiceUnavailable: 503 GET https://www.googleapis.com/storage/v1/b/bucket_name/o/folder_name%2FPolicy-APP-000456384.2019-05-16-023805.pdf/acl: Backend Error
I'm using python3 and have tried updating the version of google-cloud-storage to 1.15.0 but it didn't help.
class GoogleStorageHelper:
def __init__(self, project_name):
self.client = storage.Client(project=project_name)
def upload_file(self, bucket_name, file, file_name, content_type, blob_name, is_stream):
safe_file_name = self.get_safe_filename(file_name)
bucket = self.client.bucket(bucket_name)
blob = bucket.blob(safe_file_name)
if is_stream:
blob.upload_from_string(file, content_type=content_type)
else:
blob.upload_from_filename(file, content_type=content_type)
blob.make_public() // Getting Error here
url = blob.public_url
if isinstance(url, six.binary_type):
url = url.decode('utf-8')
logger.info('File uploaded, URL: {}'.format(url))
return url
#staticmethod
def get_safe_filename(file_name):
basename, extension = file_name.rsplit('.', 1)
return '{0}.{1}.{2}'.format(basename, datetime.now().strftime('%Y-%m-%d-%H%M%S'), extension)
Have you faced this kind of problem and solved it? Or have any ideas to fix this issue?
This is a known issue recently with GCS using Python make_public() method. The problem is now being worked on by the GCS team.
I'd suggest, as a quick mitigation strategy, to enable retries. This documentation could be helpful in setting up Retry Handling Strategy.
This one is a bit tricky. I ran into the same issue and found the Python API client doesn't enable retries for the upload_from_string() method.
All upload_from_string() does is call the upload_from_file() method, which has retries but the implementation ignores retries.
def upload_from_string(self,
data,
content_type="text/plain",
client=None,
predefined_acl=None):
data = _to_bytes(data, encoding="utf-8")
string_buffer = BytesIO(data)
self.upload_from_file(
file_obj=string_buffer,
size=len(data),
content_type=content_type,
client=client,
predefined_acl=predefined_acl,
)
You can hack the upload_from_string() method by using the upload_from_file() implementation, adding retries:
from google.cloud._helpers import _to_bytes
from io import BytesIO
from google.cloud.storage import Blob
def upload_from_string(
data, file_path, bucket, client, content_type, num_retries
):
data = _to_bytes(data, encoding="utf-8")
string_buffer = BytesIO(data)
blob = Blob(file_path, bucket)
blob.upload_from_file(
file_obj=string_buffer,
size=len(data),
client=client,
num_retries=num_retries,
content_type=content_type
)
To handle this error gracefully and wait as suggested by the 503 docs, note that these errors inherit from GoogleAPICallError, therefore can be parsed for the error code:
from google.api_core.exceptions import GoogleAPICallError
try:
blob.upload_from_filename(YOUR_UPLOAD_PARAMETERS)
except GoogleAPICallError as e:
if e.code == 503:
print(f'GCP storage unavailable: {e}')
... # handle the error gracefully, or simply ignore
else:
raise
Additionally, you may use the retry.Retry as suggested in the doc:
blob.upload_from_filename(YOUR_UPLOAD_PARAMETERS, retry=retry.Retry())

How to force download an image on click with django and aws s3

I have this view, which takes a user_id and image_id. When the user cliks the link, check if there is an image. If there is, then I would like the file to force download automatically.
template:
<a class="downloadBtn" :href="website + '/download-image/'+ user_id+'/'+ image_id +'/'">Download</a>
Before I was developing it in my local machine, and this code was working.
#api_view(['GET'])
#permission_classes([AllowAny])
def download_image(request, user_id=None, image_id=None):
try:
ui = UserImage.objects.get(user=user_id, image=image_id)
content_type = mimetypes.guess_type(ui.image.url)
wrapper = FileWrapper(open(str(ui.image.file)))
response = HttpResponse(wrapper, content_type=content_type)
response['Content-Disposition'] = 'attachment; filename="image.jpeg'
return response
except UserImage.DoesNotExist:
...
But now I am using aws s3 for my static and media files. I am using django-storages and boto3. How can I force download the image in the browser?
#api_view(['GET'])
#permission_classes([AllowAny])
def download_image(request, user_id=None, image_id=None):
try:
ui = UserImage.objects.get(user=user_id, image=image_id)
url = ui.image.url
...
... FORCE DOWNLOAD THE IMAGE
...
except UserImage.DoesNotExist:
...
... ERROR, NO IMAGE AVAILABLE
...
You can just return a HttpResponse with the image itself.
return HttpResponse(instance.image, content_type="image/jpeg")
This will return the image's byte stream. The Content-type header is to show the images in platforms like Postman.

Download image data then upload to Google Cloud Storage

I have a Flask web app that is running on Google AppEngine. The app has a form that my user will use to supply image links. I want to download the image data from the link and then upload it to a Google Cloud Storage bucket.
What I have found so far on Google's documentation tells me to use the 'cloudstorage' client library which I have installed and imported as 'gcs'.
found here: https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/read-write-to-cloud-storage
I think I am not handling the image data correctly through requests. I get a 200 code back from the Cloud Storage upload call but there is no object when I look for it in the console. Here is where I try to retrieve the image and then upload it:
img_resp = requests.get(image_link, stream=True)
objectName = '/myBucket/testObject.jpg'
gcs_file = gcs.open(objectName,
'w',
content_type='image/jpeg')
gcs_file.write(img_resp)
gcs_file.close()
edit:
Here is my updated code to reflect an answer's suggestion:
image_url = urlopen(url)
content_type = image_url.headers['Content-Type']
img_bytes = image_url.read()
image_url.close()
filename = bucketName + objectName
options = {'x-goog-acl': 'public-read',
'Cache-Control': 'private, max-age=0, no-transform'}
with gcs.open(filename,
'w',
content_type=content_type,
options=options) as f:
f.write(img_bytes)
f.close()
However, I am still getting a 201 response on the POST (create file) call and then a 200 on the PUT call but the object never appears in the console.
Try this:
from google.appengine.api import images
import urllib2
image = urllib2.urlopen(image_url)
img_resp = image.read()
image.close()
objectName = '/myBucket/testObject.jpg'
options = {'x-goog-acl': 'public-read',
'Cache-Control': 'private, max-age=0, no-transform'}
with gcs.open(objectName,
'w',
content_type='image/jpeg',
options=options) as f:
f.write(img_resp)
f.close()
And, why restrict them to just entering a url. Why not allow them to upload a local image:
if isinstance(image_or_url, basestring): # should be url
if not image_or_url.startswith('http'):
image_or_url = ''.join([ 'http://', image_or_url])
image = urllib2.urlopen(image_url)
content_type = image.headers['Content-Type']
img_resp = image.read()
image.close()
else:
img_resp = image_or_url.read()
content_type = image_or_url.content_type
If you are running on the development server, the file will be uploaded into your local datastore. Check it at:
http://localhost:<your admin port number>/datastore?kind=__GsFileInfo__
and
http://localhost:<your admin port number>/datastore?kind=__BlobInfo__