Copy an image in Amazon S3 from Image URL using Django - django

I have an image URL (for example: http://www.myexample.come/testImage.jpg) and I would to upload this image on Amazon S3 using Django.
I'm not found a way to copy directly the resource from URL in Amazon S3 passing directly the file URL.
So, I think that i have to implement these steps in my project:
Download the file locally from URL http://www.myexample.come/testImage.jpg. I will have a local file testImage.jpg
I have to upload the local file into Amazon S3. I will have a S3 Url.
I have to delete the local file testImage.jpg
Is this a good way to build this feature?
Is possible to improve these steps?
I have to use this features when I receive a REST request and I have to respond passing in the response the uploaded S3 File Url... Are these steps a good way about performance?

The easiest way off the top of my head would be to use requests with io from the python std lib -- this is a bit of code I used a while back, I just tested it with python 2.7.9 and it works
>>> requests_image('http://docs.python-requests.org/en/latest/_static/requests-sidebar.png')
and it works with the latest version of requests (2.6.0) - but I should point out that it's just a snippet, and I was in full control of the image urls being handed to the function, so there's nothing in the way of error checking (you could use Pillow to open the image and confirm it's really a jpeg, etc.)
import requests
from io import open as iopen
from urlparse import urlsplit
def requests_image(file_url):
suffix_list = ['jpg', 'gif', 'png', 'tif', 'svg',]
file_name = urlsplit(file_url)[2].split('/')[-1]
file_suffix = file_name.split('.')[1]
i = requests.get(file_url)
if file_suffix in suffix_list and i.status_code == requests.codes.ok:
with iopen(file_name, 'wb') as file:
file.write(i.content)
else:
return False

Related

AWS Lambda - How to Put ONNX Models in AWS Layers

Currently, I have been downloading my ONNX models from S3 like so:
s3 = boto3.client('s3')
if os.path.isfile('/tmp/model.onnx') != True:
s3.download_file('test', 'models/model.onnx', '/tmp/model.onnx')
inference_session = onnxruntime.InferenceSession('/tmp/model.onnx')
However, I want to decrease the latency of having to download this model. To do so, I am looking to save the model in AWS Lambda layers. However, I'm having trouble doing so.
I tried creating a ZIP file as so:
- python
- model.onnx
and loading it like inference_session = onnxruntime.InferenceSession('/opt/model.onnx') but I got a "File doesn't exist" error. What should I do to make sure that the model can be found in the /opt/ directory?
Note: My AWS Lambda function is running on Python 3.6.
Your file should be in /opt/python/model.onnx. Therefore, you should be able to use the following to get it:
inference_session = onnxruntime.InferenceSession('/opt/python/model.onnx')
If you don't want your file to be in python folder, then don't create layer with such folder. Just have model.onnx in the zip's root folder, rather then inside the python folder.

Google Cloud Webapp to use a user uploaded file for processing

I recently moved my project from heroku to google cloud. It's written in flask and basically does some text summary (nothing fancy) of an uploaded .docx file. I was able to locally use files on heroku due to their ephemeral file system.
With google cloud, finding myself lost trying to use a file uploaded and running python functions on it.
The error I'm getting is:
with open(self.file, 'rb') as file: FileNotFoundError: [Errno 2] No such file or directory: 'http://storage.googleapis.com/...'
Edited the specifics out for now but when I open the link in a browser it brings up the download window. I know the file gets there since I go to google cloud and everything is in the proper bucket.
Also is there a way to delete from the bucket immediately after python goes through the document? Currently have the lifecycle set to a day but just need the data temporarily runover.
I'm sorry if these are silly questions. Very new to this and trying to learn.
Thanks
Oh and here's the current code
gcs = storage.Client()
user_file = request.files['file']
local = secure_filename(user_file.filename)
blob = bucket.blob(local)
blob.upload_from_string(user_file.read(),content_type=user_file.content_type)
this_file = f"http://storage.googleapis.com/{CLOUD_STORAGE_BUCKET}/{local}"
then a function is supposed to open this_file
returned a public_url to a file name to be processed and used
def open_file(self):
url = self.file
file = BytesIO(requests.get(url).content)
return docx.Document(file)

using file stored on aws s3 storage -- media storage as input of a process in django view

I am trying to make my django app in production in aws, i used elastic beanstalk to deploy it so the ec2 instance is created and connected to an rds database mysql instance and i use a bucket in amazon s3 storage to store my media files on it.
When user upload a video, it is stored in s3 as : "https://bucketname.s3.amazonaws.com/media/videos/videoname.mp4".
In django developpement mode, i was using the video filename as an input to a batch script which gives a video as output.
My view in developpement mode is as follow:
def get(request):
# get video
var = Video.objects.order_by('id').last()
v = '/home/myproject/media/videos/' + str(var)
# call process
subprocess.call("./step1.sh %s" % (str(v)), shell=True)
return render(request, 'endexecut.html')
In production mode in aws (Problem), i tried:
v = 'https://bucketname.s3.amazonaws.com/media/videos/' + str(var)
but the batch process doesn't accept the url as input to process.
How can i use my video file from s3 bucket to do process with in my view as i described ? Thank you in advance.
You should not hard-code that string. There are a couple of things wrong with that:
"bucketname" is not the name of your bucket. You should use the name of your bucket if this would at all work.
Your Media File URl (In settings.py) should be pointing to the bucket url where your files are saved (If it's well configured). So you can make use of:
video_path = settings.MEDIA_URL + video_name
I am assuming you are using s3boto to handle your storages (That's not a prerequisite though, it only makes your storage handling smarter and it's highly recommended if you are pushing to s3 from a django app)

How to find the origin of data?

So far the files are just being downloaded individually like the following rather than all being in one zipped file:
s3client = boto3.client('s3')
t.download_file(‘firstbucket’, obj['Key'], filename)
Let me save you some trouble by using AWS CLI:
aws s3 cp s3://mybucket/mydir/ . --recursive ; zip myzip.zip *.csv
You can change the wildcard to suit your needs but this will work inherently faster than Python seeing as AWS CLI has been optimized far beyond the capabilities of boto
if you want to use boto you'll have to do it in a loop like you have and add each item to a zip file.
with the CLI you can use s3 sync and then zip that up
https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
aws s3 sync s3://bucket-name ./local-location && zip bucket.zip ./local-location
It looks like you're really close, but you need to pass a file name to ZipFile.write() and download_file does not return a file name. The following should work alright, but I haven't tested it exhaustively.
from tempfile import NamedTemporaryFile
from zipfile import ZipFile
import boto3
def archive_bucket(bucket_name, zip_name):
s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')
with ZipFile(zip_name, 'w') as zf:
for page in paginator.paginate(Bucket=bucket_name):
for obj in page['Contents']:
# This might have issues on some systems since the file will
# be open for writes in two places. You can use other
# methods of creating a temporary file to work around that.
with NamedTemporaryFile() as f:
s3.download_file(bucket_name, obj['Key'], f.name)
# Copies over the temprary file using the key as the
# file name in the zip.
zf.write(f.name, obj['Key'])
This has less space usage than the solutions using the CLI, but it still isn't ideal. You will still have two copies of a given file at some point in time: one in the temp file and one that has been zipped up. So you need to make sure that you have enough space on disk to support the size of all the files you're downloading plus the size of the largest of those files. If there were a way to open a file-like object that wrote directly to a file in the zip directory then you could get around that. I don't know how to do that however.

How to download Django media file uploaded using Django storages?

I have successfully stored a Word document to S3 using Django-storages.
class Document(TitleSlugDescriptionModel, TimeStampedModel):
document = models.FileField(upload_to=user_directory_path)
Now in celery task, I need to download this file again for further processing in the worker.
Do I need to read the file from URL and then create the local copy explicitly or is there any way to create local copy using Django-storages?
You can read the file directly in Django using the read method as in
document.read()
this will output binary which you can save to a file using
f=open(filename, 'wb')
f.write(document.read())
f.close
You can also generate URLs to the file on S3 as well. More info on both in the docs: https://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html