Best way to write an image to a Django HttpResponse() - django

I need to serve images securely to validated users only (i.e. they can't be served as static files). I currently have the following Python view in my Django project, but it seems inefficient. Any ideas for a better way?
def secureImage(request,imagePath):
response = HttpResponse(mimetype="image/png")
img = Image.open(imagePath)
img.save(response,'png')
return response
(Image is imported from PIL.)

Well, re-encoding is needed sometimes (i.e. applying an watermark over an image while keeping the original untouched), but for the most simple of cases you can use:
try:
with open(valid_image, "rb") as f:
return HttpResponse(f.read(), content_type="image/jpeg")
except IOError:
red = Image.new('RGBA', (1, 1), (255,0,0,0))
response = HttpResponse(content_type="image/jpeg")
red.save(response, "JPEG")
return response

Make use of FileResponse
A cleaner way, here we dont have to worry about the Content-Length and Content-Type headers, they are automatically added by guessing the contents of open().
from django.http import FileResponse
def send_file(response):
img = open('media/hello.jpg', 'rb')
response = FileResponse(img)
return response

Just stumbled on the somewhat bad advice (for production) and thought I would mention X-Sendfile which works with both Apache and Nginx and probably other webservers too.
https://pythonhosted.org/xsendfile/
Modern Web servers like Nginx are generally able to serve files faster, more efficiently and more reliably than any Web application they host. These servers are also able to send to the client a file on disk as specified by the Web applications they host. This feature is commonly known as X-Sendfile.
This simple library makes it easy for any WSGI application to use X-Sendfile, so that they can control whether a file can be served or what else to do when a file is served, without writing server-specific extensions. Use cases include:
Restrict document downloads to authenticated users.
Log who’s downloaded a file. Force a file to be downloaded instead of
rendered by the browser, or serve it with a name different from the
one on disk, by setting the Content-Disposition header.
The basic idea is you open the file and pass that handle back to the webserver which then returns the bytes to the client, freeing your python code to handle the next request. This is far more performant than the solution above since a slow client on the other end could hang your python thread for as long as it takes to download the file.
Here is a repo that shows how to do this for various webservers and although it is pretty old, it will at least give you an idea of what you need to do. https://github.com/johnsensible/django-sendfile

Related

Django rest framework large file upload

Is there an easy way to upload large files from the client side to a django rest framework endpoint. In my application, users will be uploading very large files (>4gb). Browsers have a upload limit, here's the chart.
My current idea is to upload the file in chunks from the client side and receive the chunks from the rest endpoint. But how will I do that? I saw some libraries like - resumable.js, tus.js, flow.js etc. But how will I handle the chunks in the backend? Is there any library that is actively maintained for a problem like this? Please help me.
Maybe this module could help: https://github.com/jkeifer/drf-chunked-upload.
The module is utilized into a sample django app at the link, with example code for implementation. Here is the typical usage case the module provides, without the sample code for simplicity (code is at the link if you want it):
An initial PUT request is sent to the url linked to ChunkedUploadView (or any subclass) with the first chunk of the file. The name of the chunk file can be overriden in the view (class attribute field_name).
In return, the server will respond with the url of the upload, and the current offset.
3 Repeatedly PUT subsequent chunks to the url returned from the server.
Server will continue responding with the url and current offset.
Finally, when upload is completed, POST a request to the returned url. This request must include the checksum (hex) of the entire file.
If everything is OK, server will response with status code 200 and the data returned in the method get_response_data (if any).
If you want to upload a file as a single chunk, this is also possible! Simply make the first request a POST and include the checksum digest for the file. You don't need to include the Content-Range header if uploading a whole file.
Based on these instructions, it seems that the server handles the upload by tracking offsets of the chunk through received headers ("Content-Range"), as well as its url, storing the uploaded chunks in .part files. It then responds like so:
{
'id': 'f64ebd67-83a3-45b6-8acd-c749ea1ed4cd'
'url': 'https://your-host/<path_to_view>/f64ebd67-83a3-45b6-8acd-c749ea1ed4cd',
'file': 'https://your-host/<path_to_file>/f64ebd67-83a3-45b6-8acd-c749ea1ed4cd.part',
'filename': 'example.bin',
'offset': 10000,
`created_at`: '2021-05-18T17:12:50.318718Z',
'status': 1,
'completed_at': None,
'user': 1
}
When the full file is uploaded as determined by the recieved headers, the .part files are combined into the final upload. This also allows you to resume uploads if they are interuptted, because the existing .part files persist until the upload finishes.
https://stackoverflow.com/a/26278960/12776116
Maybe this can help you. As you mentioned, the file is uploaded by breaking it into small parts.
you should do it with celery tasks.
take a look at this link. it explains how to upload a file using django and celery.

I can't upload multiple large files

I am developing an application with Django, AWS S3 and hosted on Heroku.
At one point users have to upload multiple large files, totaling around 150MB each time.
I have tried various approaches.
1st attempt: directly call the save method of the Django form:
Result: the request takes more than 30 seconds and returns a timeout.
2nd attempt: temporarily save the file to a Heroku directory and read it from Celery task.
Result: Cannot save because it throws FileNotFoundError: [Errno 2] No such file or directory on production.
3rd attempt: pass the uploaded files (in memory files) to a celery task but the bytes cannot be serialized and passed to the task neither with json or with pickle.
Could anyone help me please?
Thanks advance.
Another approach can be like that
Expose an APIs to generate presigned URL for Frontend (Steps are here).
Upload files by using that URL from the frontend in async way. That will offload your computation at Backend.
After successful upload, you will get an URL of file location. Now save the S3 URL along with other fields data to Django model.
You can upload more than 150MB file size by this method. Your system will be scalable.

How to facilitate downloading both CSV and PDF from API Gateway connected to S3

In the app I'm working on, we have a process whereby a user can download a CSV or PDF version of their data. The generation works great, but I'm trying to get it to download the file and am running into all sorts of problems. We're using API Gateway for all the requests, and the generation happens inside a Lambda on a POST request. The GET endpoint takes in a file_name parameter and then constructs the path in S3 and then makes the request directly there. The problem I'm having is when I'm trying to transform the response. I get a 500 error and when I look at the logs, it says Execution failed due to configuration error: Unable to transform response. So, clearly that's where I've spent most of my time. I've tried at least 50 different iterations of templates and combinations with little success. The closest I've gotten is the following code, where the CSV downloads fine, but the PDF is not a valid PDF anymore:
CSV:
#set($contentDisposition = "attachment;filename=${method.request.querystring.file_name}")
$input.body
#set($context.responseOverride.header.Content-Disposition = $contentDisposition)
PDF:
#set($contentDisposition = "attachment;filename=${method.request.querystring.file_name}")
$util.base64Encode($input.body)
#set($context.responseOverride.header.Content-Disposition = $contentDisposition)
where contentHandling = CONVERT_TO_TEXT. My binaryMediaTypes just has application/pdf and that's it. My goal is to get this working without having to offload the problem into a Lambda so we don't have that overhead at the download step. Any ideas how to do this right?
Just as another comment, I've tried CONVERT_TO_BINARY and just leaving it as Passthrough. I've tried it with text/csv as another binary media type and I've tried different combinations of encoding and decoding base64 and stuff. I know the data is coming back right from S3, but the transformation is where it's breaking. I am happy to post more logs if need be. Also, I'm pretty sure this makes sense on StackOverflow, but if it would fit in another StackExchange site better, please let me know.
Resources I've looked at:
https://docs.aws.amazon.com/apigateway/latest/developerguide/request-response-data-mappings.html
https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-mapping-template-reference.html#util-template-reference
https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-payload-encodings-workflow.html
https://docs.amazonaws.cn/en_us/apigateway/latest/developerguide/api-gateway-payload-encodings-configure-with-control-service-api.html.
(But they're all so confusing...)
EDIT: One Idea I've had is to do CONVERT_TO_BINARY and somehow base64 encode the CSVs in the transformation, but I can't figure out how to do it right. I keep feeling like I'm misunderstanding the order of things, specifically when the "CONVERT" part happens. If that makes any sense.
EDIT 2: So, I got rid of the $util.base64Encode in the PDF one and now I have a PDF that's empty. The actual file in S3 definitely has things in it, but for some reason CONVERT_TO_TEXT is not handling it right or I'm still not understading how this all works.
Had similar issues. One major thing is the Accept header. I was testing in chrome which sends Accept header as text/html,application/xhtml.... api-gateway ignores everything except the first one(text/html). It will then convert any response from S3 to base64 to try and conform to text/html.
At last after trying everything else I tried via Postman which defaults the Accept header to */*. Also set your content handling on the Integration response to Passthrough. And everything was working!
One other thing is to pass the Content-Type and Content-Length headers through(Add them in method response first and then in Integration response):
Content-Length integration.response.header.Content-Length
Content-Type integration.response.header.Content-Type

Copy an image in Amazon S3 from Image URL using Django

I have an image URL (for example: http://www.myexample.come/testImage.jpg) and I would to upload this image on Amazon S3 using Django.
I'm not found a way to copy directly the resource from URL in Amazon S3 passing directly the file URL.
So, I think that i have to implement these steps in my project:
Download the file locally from URL http://www.myexample.come/testImage.jpg. I will have a local file testImage.jpg
I have to upload the local file into Amazon S3. I will have a S3 Url.
I have to delete the local file testImage.jpg
Is this a good way to build this feature?
Is possible to improve these steps?
I have to use this features when I receive a REST request and I have to respond passing in the response the uploaded S3 File Url... Are these steps a good way about performance?
The easiest way off the top of my head would be to use requests with io from the python std lib -- this is a bit of code I used a while back, I just tested it with python 2.7.9 and it works
>>> requests_image('http://docs.python-requests.org/en/latest/_static/requests-sidebar.png')
and it works with the latest version of requests (2.6.0) - but I should point out that it's just a snippet, and I was in full control of the image urls being handed to the function, so there's nothing in the way of error checking (you could use Pillow to open the image and confirm it's really a jpeg, etc.)
import requests
from io import open as iopen
from urlparse import urlsplit
def requests_image(file_url):
suffix_list = ['jpg', 'gif', 'png', 'tif', 'svg',]
file_name = urlsplit(file_url)[2].split('/')[-1]
file_suffix = file_name.split('.')[1]
i = requests.get(file_url)
if file_suffix in suffix_list and i.status_code == requests.codes.ok:
with iopen(file_name, 'wb') as file:
file.write(i.content)
else:
return False

How to mix Django, Uploadify, and S3Boto Storage Backend?

Background
I'm doing fairly big file uploads on Django. File size is generally 10MB-100MB.
I'm on Heroku and I've been hitting the request timeout of 30 seconds.
The Beginning
In order to get around the limit, Heroku's recommendation is to upload from the browser DIRECTLY to S3.
Amazon documents this by showing you how to write an HTML form to perform the upload.
Since I'm on Django, rather than write the HTML by hand, I'm using django-uploadify-s3 (example). This provides me with an SWF object, wrapped in JS, that performs the actual upload.
This part is working fine! Hooray!
The Problem
The problem is in tying that data back to my Django model in a sane way.
Right now the data comes back as a simple URL string, pointing to the file's location.
However, I was previously using S3 Boto from django-storages to manage all of my files as FileFields, backed by the delightful S3BotoStorageFile.
To reiterate, S3 Boto is working great in isolation, Uploadify is working great in isolation, the problem is in putting the two together.
My understanding is that the only way to populate the FileField is by providing both the filename AND the file content. When you're uploading files from the browser to Django, this is no problem, as Django has the file content in a buffer and can do whatever it likes with it. However, when doing direct-to-S3 uploads like me, Django only receives the file name and URL, not the binary data, so I can't properly populate the FieldFile.
Cry For Help
Anyone know a graceful way to use S3Boto's FileField in conjunction with direct-to-S3 uploading?
Else, what's the best way to manage an S3 file just based on its URL? Including setting expiration, key id, etc.
Many thanks!
Use a URLField.
I had a similar issue where i want to store file to s3 either directly using FileField or i have an option for the user to input the url directly. So to circumvent that, i used 2 fields in my model, one for FileField and one for URLField. And in the template i could use 'or' to see which one exists and to use that like {{ instance.filefield or instance.url }}.
This is untested, but you should be able to use:
from django.core.files.storage import default_storage
f = default_storage.open('name_you_expect_in_s3', 'r')
#f is an instance of S3BotoStorageFile, and can be assigned to a field
obj, created = YourObject.objects.get_or_create(**stuff_you_know)
obj.s3file_field = f
obj.save()
I think this should set up the local pointer to s3 and save it, without over writing the content.
ETA: You should do this only after the upload completes on S3 and you know the key in s3.
Checkout django-filetransfers. Looks like it plays nice with django-storages.
I've never used django, so ymmv :) but why not just write a single byte to populate the content? That way, you can still use FieldFile.
I'm thinking that writing actual SQL may be the easiest solution here. Alternatively you could subclass S3BotoStorage, override the _save method and allow for an optional kwarg of filepath which sidesteps all the other saving stuff and just returns the cleaned_name.