How to mix Django, Uploadify, and S3Boto Storage Backend? - django

Background
I'm doing fairly big file uploads on Django. File size is generally 10MB-100MB.
I'm on Heroku and I've been hitting the request timeout of 30 seconds.
The Beginning
In order to get around the limit, Heroku's recommendation is to upload from the browser DIRECTLY to S3.
Amazon documents this by showing you how to write an HTML form to perform the upload.
Since I'm on Django, rather than write the HTML by hand, I'm using django-uploadify-s3 (example). This provides me with an SWF object, wrapped in JS, that performs the actual upload.
This part is working fine! Hooray!
The Problem
The problem is in tying that data back to my Django model in a sane way.
Right now the data comes back as a simple URL string, pointing to the file's location.
However, I was previously using S3 Boto from django-storages to manage all of my files as FileFields, backed by the delightful S3BotoStorageFile.
To reiterate, S3 Boto is working great in isolation, Uploadify is working great in isolation, the problem is in putting the two together.
My understanding is that the only way to populate the FileField is by providing both the filename AND the file content. When you're uploading files from the browser to Django, this is no problem, as Django has the file content in a buffer and can do whatever it likes with it. However, when doing direct-to-S3 uploads like me, Django only receives the file name and URL, not the binary data, so I can't properly populate the FieldFile.
Cry For Help
Anyone know a graceful way to use S3Boto's FileField in conjunction with direct-to-S3 uploading?
Else, what's the best way to manage an S3 file just based on its URL? Including setting expiration, key id, etc.
Many thanks!

Use a URLField.

I had a similar issue where i want to store file to s3 either directly using FileField or i have an option for the user to input the url directly. So to circumvent that, i used 2 fields in my model, one for FileField and one for URLField. And in the template i could use 'or' to see which one exists and to use that like {{ instance.filefield or instance.url }}.

This is untested, but you should be able to use:
from django.core.files.storage import default_storage
f = default_storage.open('name_you_expect_in_s3', 'r')
#f is an instance of S3BotoStorageFile, and can be assigned to a field
obj, created = YourObject.objects.get_or_create(**stuff_you_know)
obj.s3file_field = f
obj.save()
I think this should set up the local pointer to s3 and save it, without over writing the content.
ETA: You should do this only after the upload completes on S3 and you know the key in s3.

Checkout django-filetransfers. Looks like it plays nice with django-storages.

I've never used django, so ymmv :) but why not just write a single byte to populate the content? That way, you can still use FieldFile.

I'm thinking that writing actual SQL may be the easiest solution here. Alternatively you could subclass S3BotoStorage, override the _save method and allow for an optional kwarg of filepath which sidesteps all the other saving stuff and just returns the cleaned_name.

Related

Correct way to fetch data from an aws server into a flutter app?

I have a general understanding question. I am building a flutter app that relies on a content library containing text files, latex equations, images, pdfs, videos etc.
The content lies on an aws amplify backend. Depending on the navigation of the user in the app, the corresponding data is fetched and displayed.
I am not sure about the correct way of fetching the data. The current method (which works) is that the data is stored in an S3 bucket. When data is requested, the data is downloaded to a temporary directory and then opened and processed in the app. This is actually not slow, but I feel that it is not the way it should be done.
When data is downloaded a file transfer notification pops up, which bothers me because it is shown all the time. Also I would like to read the data directly with something like a get request, without downloading the file first (specially for text files, which I would like to read directly into a String). But here I don't know how it works, because I don't see that you can save data in a file system with the other amplify services like data store or the rest api. Also, the S3 bucket is an intuitive way of storing data that is easy to use for the content creators of my company, for me it seems that the S3 bucket is the way to go. However with S3 I have only figured out the download method to fetch data.
Could someone give me a hint on what is the correct approach for this use case? Thank you very much!

Django: Insert image to PostgreSQL (PgAdmin4)

Im currently working an e-shop. So my idea is to store images with Django models in PgAdmin4. As i saw in older posts methods like bytea('D:\image.jpg') and so on just converts the string constant to its binary representation.
So my question is if there is a newer method to store the actual image, or if it is possible to grab the image via a path?
models.py
image = models.ImageField(null=True, blank=True)
PgAdmin4
INSERT INTO product_images(
id, image)
VALUES (SERIAL, ?);// how to insert image?
There are several options for keeping images. The first is to use a storage service like S3, which I recommend. You can read this article for more detailed information. I can also recommend that I have used a third party package ready to use S3 with Django. If you use this option, imagefield will keep the path in S3.
Another option is if you are using only one server, you can keep the pictures in that server's local. Again imagefield will keep the path.
If you say I want to keep it directly in the database, you can follow this link. Currently, there is no newer method for it.
But I have to say that I think using a storage service like S3 is the best way under all circumstances.

What is the correct way to set up S3 for loading content in the browser?

I want to do the following: a user in a browser types some text and after he presses a 'Save' button, the text should be saved in a file (for example: content.txt) in a folder (for example: /username_text) on the root of an S3 bucket.
Also, I want the user to be able, when he visits the same page, load the content from S3 and continue working on the file. Then, if he/she is done, save the file to S3 again.
Probably important to mention, but I plan on using NodeJS for my back-end...
My question now is: What is the best way to set this storing-and-retrieving thing up? Do I create an API gateway + Lambda function to GET and POST files through that? Or do I for example use the aws-sdk in Node to directly push and pull files from S3? Or is there a better way to do this?
I looked at the following two guides:
Using AWS S3 Buckets in a NodeJS App – Codebase – Medium
Image Upload and Retrieval from S3 Using AWS API Gateway and Lambda
Welcome to StackOverflow!
I think you are worrying too much about the not-so-important stuff. S3 is nothing but a storage system. You could have decided to store the content of these files on DynamoDB, RDS, etc. What would you do if you stored its contents on these real databases? You'd fetch for data and display it to the user, wouldn't you?
This is what you need to do with S3! S3 is a smart choice on your scenario because your "file" can grow very big and S3 is a great place for storing files. However, apparently, you're not actually storing files (think of .pdf, .mp4, .mov, etc.), you're essentially only storing human-readable text.
So here's one approach on how to solve your problem:
FETCHING FILE CONTENT
User logs in
You fetch the user's personal information based on some token. You can store all the metadata in DynamoDB, where given a user_id, fetch all the "files" from this user. These "files" (metadata only) would be the bucket and key for the actual file on S3.
You use the getObject API from S3 to fetch the file based on your query and display the body of your file to your user in a RESTful way. Your response should look something like this:
{
"content": "some content"
}
SAVING FILE CONTENT
User logs in
The user writes anything in a form and submits it. In your Lambda function, you grab the content of this form and process it. This request should look something like this:
{
"file_id": "some-id",
"user_id": "some-id",
"content": "some-content"
}
If the file_id exists, update the content in S3. Otherwise, upload a new file in S3 and then create a new entry in DynamoDB. You'd then, of course, have to handle if the user submitting the changes actually owns the file, but if you're using UUIDs it shouldn't be too much of a problem, but still worth checking in case an ID is leaked somehow.
This way, you don't need to worry about uploading/downloading files as these are CPU intensive tasks, so you can keep your costs low as well as using very little RAM in your functions (128MB should be more than enough), after all, you're now only serving text. Not only this will simplify your way of designing it, but will also make things simpler both in API Gateway and in your code as you won't have to deal with binary types. The maximum you'll do is convert the buffer from S3 to a String when serving some content, but this should be completely fine.
EDIT
On your question regarding whether you should upload it from the browser or not, I suggest you take a look into this answer where I cover the pros/cons of doing it via API Gateway vs from the Browser.

Django how to upload file directly to 3rd-part storage server, like Cloudinary, S3

Now, I have realized the uploading process is like that:
1. Generate the HTTP request object, and set the value to request.FILE by using uploadhandler.
2. In the views.py, the instance of FieldFile which is the mirror of FileField will call the storage.save() to upload file.
So, as you see, django always use the cache or disk to pass the data, if your file is too large, it will cost too much time.
And the design I want to figure this problem is to custom an uploadhandler which will call storage.save() by using input raw data. The only question is how can I modify the actions of FileField?
Thanks for any help.
you can use this package
Add direct uploads to AWS S3 functionality with a progress bar to file input fields.
https://github.com/bradleyg/django-s3direct
You can use one of the following packages
https://github.com/cloudinary/pycloudinary
http://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html

How to use Django ImageField, and why use it at all?

Up until now, I've been storing my image filenames in a CharField and saving the actual file directly to S3. This was a fine solution for my own usage. I'd like to reconsider using an ImageField, since now there will be other users and file input validation would be appropriate.
I have a couple of questions that weren't exactly answered after reading the docs and the source code for FileField (which appears to be essentially ImageField minus the Pillow check and dimension field updating functionality).
1) Why use an ImageField at all? Or rather, why use a FileField? Sure, it's convenient for quick-and-easy forms and convenient for inserting to Django templates. But are there any substantial reasons, eg. Is it evidently secured against exploits and malicious uploads?
2) How to write to the field file? If it is correct that the file can be read by instance.imagefield (or is it instance.imagefield.file?), if I want to write to it can I simply do the following?
#receiver(pre_save, sender=Image)
def pre_save_image(sender, instance, *args, **kwargs):
instance.imagefield = process_image(instance.imagefield)
3) How to try saving with a specific filename, then try again with a new filename if that randomly generated filename already exists? For example with my code right now I do this, how can it be done with ImageField? I want to do it at the model layer, because if I do repeated tries at the view layer then the pre_save processing would run again which is ghetto (even though it's unlikely that it'll have a second try ever in the lifetime of the service).
for i in range(tries):
try:
name = generate_random_name()
media_storage.save(name + '.jpg', ContentFile(final_bytes))
break
except:
pass
4) In the models.py pre_save and post_save signals and in the actual model's save(), how can I tell if a file came in with the request? i.e. I want to know if a new image is incoming to be saved, or if there is no image (some other field in the object is being updated and the image itself remains unchanged).
I don't see any advantage of FileField or ImageField over what you are doing today. In fact, as I see it, the proper/modern/scalable way to deal with uploads is to have the client (browser) upload files directly to S3.
If done correctly (from a security stand point), this scheme allows you to scale in an incredible way without the need to add more computer power on your side. As an example, consider 100 people uploading a picture at the same time. Your server will need to receive all these data, only to upload it again to S3. On the other side, you can have a 1000 people upload at the same time, and I can assure you AWS can handle it. Your server only needs to handle the signing of the URL, which is a lot less work.
Take a look at fine-uploader, as a good technology to use to handle the efficient upload to s3 (loading in chunks, error checking, etc): http://docs.fineuploader.com/endpoint_handlers/amazon-s3.html. Google "django fineuploader" to find a sample application for Django.
In my case, I use a Model with a couple CharFields (bucket, key) plus a few other things specific to my application. My data flow is as follows:
Django services a page with the fine-uploader widget, configured based on my settings.
Fineuploader requests a signed URL from the django server (endpoint), and uses that to upload to S3 directly.
When the upload is complete, fineUploader makes another request to my server to register the completion of the upload, at which time, I create my object on the database. In this case, if the upload fails, I never create an object on the database.
On the AWS side, S3 triggers a Lambda function, which I use to create a thumbnail, and store it back to S3. So, I don't even use my own CPU (e.g. Celery) for resizing. So you see, not only can I have thousands of users uploading at the same time, but I can resize those thousand pictures in parallel, and for less than what an EC2 worker will cost me.
My Django Model is also used as a wrapper to manage the business logic (e.g. functions like get_original_url() and get_thumbnail_url()), so after the uploads, it is easy for my templates to get the signed read-onlly URLs.
In short, you can implement your own version of Fineuploader if you want, or use many of the alternative, but assuming you follow the recommended security best practices on the AWS side (e.g. create a special IAM with only write permission for the client, even if you are using signed URLs), this, IMO, is the best practice for dealing with uploads, especially if you are using S3 or similar to store these files.
Sorry if I am only really answering question 1, but questions 2 and 3 don't apply if you accept my answer for 1.
1) Why use an ImageField at all? Or rather, why use a FileField?
It's convenient for quick-and-easy forms and convenient for inserting
to Django templates.
But are there any substantial reasons, eg. Is it evidently secured against exploits and malicious uploads?
Yes. I daresay your own code probably does it too, but for a newby using the FileField will probably ensure that your important system files are not getting overwritten by a malicious upload.
2) How to write to the field file?
In your situation you would need to use a special storage backend that makes it possible to write directly to the Amazon S3. As you know, the storage backend for FileFile and ImageField are plugable. Here is one example plugin: `http://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html
There is sample code which demonstrates how it can be written to. So I wll not go into that.`
3) How to try saving with a specific filename, then try again with a new filename if that randomly generated filename already exists?
ImageField and FileField takes care of this for you automatically. It will create a new filename if the old one exists. The code in my answer here did that automatically when I called it over and over again. here are some sample filenames produces (input being bada.png)
"4", "media/bada.png"
"5", "media/bada_aH0gV7t.png"
"7", "media/bada_XkzthgK.png"
"8", "media/bada_YzZuwDi.png"
"9", "media/bada_wpkasI3.png"
4) In the models.py pre_save and post_save signals and in the actual model's save(), how can I tell if a file came in with the request?
Your instance.pk will be None
If this is a modification to an existing file the PK will be set.
If this is a new image upload in the pre_save
Took me forever to learn how to save an image using ImageField. Turns out it's crazy easy -- once you know how to do it, it is, at least. I mean, it all comes together sensibly after you see it.
So basically, you're working with a FileField. I already looked into the differences between ImageField and FileField:
ImageField takes everything FileField takes in terms of attributes,
but ImageField also takes a width and height attribute if indicated.
ImageField, unlike FileField, validates an upload, making sure it's
an image.
Using ImageField comes down to most of the same constructs as FileField does. The biggest things to remember:
request.FILES['name_of_model']
So a form is generated from something in forms.py (or wherever your forms are) like this:
imgfile = forms.ImageField(label = 'Choose your image',
help_text = 'The image should be cool.')
In the model, you might have this in correspondence:
imgfile = models.ImageField(upload_to='images/%m/%d')
So there will be a POST request from the user (when the user completes the form). That request will contain basically a dictionary of data. The dictionary holds the submitted files. To focus the request on the file from the field (in our case, an ImageField), you would use:
request.FILES['imgfield']
You would use that when you construct the model object (instantiating your model class):
newPic = ImageModel(imgfile = request.FILES['imgfile'])
To save that the simple way, you'd just use the save() method bestowed upon your object (because Django is that awesome):
if form.is_valid():
newPic = Pic(imgfile = request.FILES['imgfile'])
newPic.save()
Your image will be stored, by default, to the directory you indicate for MEDIA_ROOT in settings.py.
The tough part, which isn't really so tough when you catch on, is accessing the image.
In your template, you could have something like this:
<img src="{{ MEDIA_URL }}{{ image.imgfile.name }}"></img>
Where {{ MEDIA_URL }} is something like /media/, as indicated in settings.py and {{ image.imgfile.name }} is the name of the file and the subdirectory you indicated in the model. "image" in this case is just the current image in a loop of images you might create to access each image in the database:
{% for image in images %}
{% endfor %}
Make SURE you configure your urls properly to handle the image or the image won't work. Add this to your urls:
urlpatterns += patterns('',
url(r'^media/(?P<path>.*)$', 'django.views.static.serve', {
'document_root': settings.MEDIA_ROOT,
}),
)