How to use Django ImageField, and why use it at all? - django

Up until now, I've been storing my image filenames in a CharField and saving the actual file directly to S3. This was a fine solution for my own usage. I'd like to reconsider using an ImageField, since now there will be other users and file input validation would be appropriate.
I have a couple of questions that weren't exactly answered after reading the docs and the source code for FileField (which appears to be essentially ImageField minus the Pillow check and dimension field updating functionality).
1) Why use an ImageField at all? Or rather, why use a FileField? Sure, it's convenient for quick-and-easy forms and convenient for inserting to Django templates. But are there any substantial reasons, eg. Is it evidently secured against exploits and malicious uploads?
2) How to write to the field file? If it is correct that the file can be read by instance.imagefield (or is it instance.imagefield.file?), if I want to write to it can I simply do the following?
#receiver(pre_save, sender=Image)
def pre_save_image(sender, instance, *args, **kwargs):
instance.imagefield = process_image(instance.imagefield)
3) How to try saving with a specific filename, then try again with a new filename if that randomly generated filename already exists? For example with my code right now I do this, how can it be done with ImageField? I want to do it at the model layer, because if I do repeated tries at the view layer then the pre_save processing would run again which is ghetto (even though it's unlikely that it'll have a second try ever in the lifetime of the service).
for i in range(tries):
try:
name = generate_random_name()
media_storage.save(name + '.jpg', ContentFile(final_bytes))
break
except:
pass
4) In the models.py pre_save and post_save signals and in the actual model's save(), how can I tell if a file came in with the request? i.e. I want to know if a new image is incoming to be saved, or if there is no image (some other field in the object is being updated and the image itself remains unchanged).

I don't see any advantage of FileField or ImageField over what you are doing today. In fact, as I see it, the proper/modern/scalable way to deal with uploads is to have the client (browser) upload files directly to S3.
If done correctly (from a security stand point), this scheme allows you to scale in an incredible way without the need to add more computer power on your side. As an example, consider 100 people uploading a picture at the same time. Your server will need to receive all these data, only to upload it again to S3. On the other side, you can have a 1000 people upload at the same time, and I can assure you AWS can handle it. Your server only needs to handle the signing of the URL, which is a lot less work.
Take a look at fine-uploader, as a good technology to use to handle the efficient upload to s3 (loading in chunks, error checking, etc): http://docs.fineuploader.com/endpoint_handlers/amazon-s3.html. Google "django fineuploader" to find a sample application for Django.
In my case, I use a Model with a couple CharFields (bucket, key) plus a few other things specific to my application. My data flow is as follows:
Django services a page with the fine-uploader widget, configured based on my settings.
Fineuploader requests a signed URL from the django server (endpoint), and uses that to upload to S3 directly.
When the upload is complete, fineUploader makes another request to my server to register the completion of the upload, at which time, I create my object on the database. In this case, if the upload fails, I never create an object on the database.
On the AWS side, S3 triggers a Lambda function, which I use to create a thumbnail, and store it back to S3. So, I don't even use my own CPU (e.g. Celery) for resizing. So you see, not only can I have thousands of users uploading at the same time, but I can resize those thousand pictures in parallel, and for less than what an EC2 worker will cost me.
My Django Model is also used as a wrapper to manage the business logic (e.g. functions like get_original_url() and get_thumbnail_url()), so after the uploads, it is easy for my templates to get the signed read-onlly URLs.
In short, you can implement your own version of Fineuploader if you want, or use many of the alternative, but assuming you follow the recommended security best practices on the AWS side (e.g. create a special IAM with only write permission for the client, even if you are using signed URLs), this, IMO, is the best practice for dealing with uploads, especially if you are using S3 or similar to store these files.
Sorry if I am only really answering question 1, but questions 2 and 3 don't apply if you accept my answer for 1.

1) Why use an ImageField at all? Or rather, why use a FileField?
It's convenient for quick-and-easy forms and convenient for inserting
to Django templates.
But are there any substantial reasons, eg. Is it evidently secured against exploits and malicious uploads?
Yes. I daresay your own code probably does it too, but for a newby using the FileField will probably ensure that your important system files are not getting overwritten by a malicious upload.
2) How to write to the field file?
In your situation you would need to use a special storage backend that makes it possible to write directly to the Amazon S3. As you know, the storage backend for FileFile and ImageField are plugable. Here is one example plugin: `http://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html
There is sample code which demonstrates how it can be written to. So I wll not go into that.`
3) How to try saving with a specific filename, then try again with a new filename if that randomly generated filename already exists?
ImageField and FileField takes care of this for you automatically. It will create a new filename if the old one exists. The code in my answer here did that automatically when I called it over and over again. here are some sample filenames produces (input being bada.png)
"4", "media/bada.png"
"5", "media/bada_aH0gV7t.png"
"7", "media/bada_XkzthgK.png"
"8", "media/bada_YzZuwDi.png"
"9", "media/bada_wpkasI3.png"
4) In the models.py pre_save and post_save signals and in the actual model's save(), how can I tell if a file came in with the request?
Your instance.pk will be None
If this is a modification to an existing file the PK will be set.
If this is a new image upload in the pre_save

Took me forever to learn how to save an image using ImageField. Turns out it's crazy easy -- once you know how to do it, it is, at least. I mean, it all comes together sensibly after you see it.
So basically, you're working with a FileField. I already looked into the differences between ImageField and FileField:
ImageField takes everything FileField takes in terms of attributes,
but ImageField also takes a width and height attribute if indicated.
ImageField, unlike FileField, validates an upload, making sure it's
an image.
Using ImageField comes down to most of the same constructs as FileField does. The biggest things to remember:
request.FILES['name_of_model']
So a form is generated from something in forms.py (or wherever your forms are) like this:
imgfile = forms.ImageField(label = 'Choose your image',
help_text = 'The image should be cool.')
In the model, you might have this in correspondence:
imgfile = models.ImageField(upload_to='images/%m/%d')
So there will be a POST request from the user (when the user completes the form). That request will contain basically a dictionary of data. The dictionary holds the submitted files. To focus the request on the file from the field (in our case, an ImageField), you would use:
request.FILES['imgfield']
You would use that when you construct the model object (instantiating your model class):
newPic = ImageModel(imgfile = request.FILES['imgfile'])
To save that the simple way, you'd just use the save() method bestowed upon your object (because Django is that awesome):
if form.is_valid():
newPic = Pic(imgfile = request.FILES['imgfile'])
newPic.save()
Your image will be stored, by default, to the directory you indicate for MEDIA_ROOT in settings.py.
The tough part, which isn't really so tough when you catch on, is accessing the image.
In your template, you could have something like this:
<img src="{{ MEDIA_URL }}{{ image.imgfile.name }}"></img>
Where {{ MEDIA_URL }} is something like /media/, as indicated in settings.py and {{ image.imgfile.name }} is the name of the file and the subdirectory you indicated in the model. "image" in this case is just the current image in a loop of images you might create to access each image in the database:
{% for image in images %}
{% endfor %}
Make SURE you configure your urls properly to handle the image or the image won't work. Add this to your urls:
urlpatterns += patterns('',
url(r'^media/(?P<path>.*)$', 'django.views.static.serve', {
'document_root': settings.MEDIA_ROOT,
}),
)

Related

Django: Insert image to PostgreSQL (PgAdmin4)

Im currently working an e-shop. So my idea is to store images with Django models in PgAdmin4. As i saw in older posts methods like bytea('D:\image.jpg') and so on just converts the string constant to its binary representation.
So my question is if there is a newer method to store the actual image, or if it is possible to grab the image via a path?
models.py
image = models.ImageField(null=True, blank=True)
PgAdmin4
INSERT INTO product_images(
id, image)
VALUES (SERIAL, ?);// how to insert image?
There are several options for keeping images. The first is to use a storage service like S3, which I recommend. You can read this article for more detailed information. I can also recommend that I have used a third party package ready to use S3 with Django. If you use this option, imagefield will keep the path in S3.
Another option is if you are using only one server, you can keep the pictures in that server's local. Again imagefield will keep the path.
If you say I want to keep it directly in the database, you can follow this link. Currently, there is no newer method for it.
But I have to say that I think using a storage service like S3 is the best way under all circumstances.

Django how to upload file directly to 3rd-part storage server, like Cloudinary, S3

Now, I have realized the uploading process is like that:
1. Generate the HTTP request object, and set the value to request.FILE by using uploadhandler.
2. In the views.py, the instance of FieldFile which is the mirror of FileField will call the storage.save() to upload file.
So, as you see, django always use the cache or disk to pass the data, if your file is too large, it will cost too much time.
And the design I want to figure this problem is to custom an uploadhandler which will call storage.save() by using input raw data. The only question is how can I modify the actions of FileField?
Thanks for any help.
you can use this package
Add direct uploads to AWS S3 functionality with a progress bar to file input fields.
https://github.com/bradleyg/django-s3direct
You can use one of the following packages
https://github.com/cloudinary/pycloudinary
http://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html

Timing issue with image upload on Heroku + Django Rest Framework + s3

ok this is a mix of architecture design and code question.. I've had this problem on a couple of applications and I'm not sure how to handle this.
The problem is fairly basic, an app which allows users to attach files/images to an object.
Let's say I have the following Django models: Document and Attachment.
class Document(models.Model):
body = models.TextField(blank=True, help_text='Plain text.')
class Attachment(models.Model):
document = models.ForeignKey(Document, on_delete=models.CASCADE, related_name='attachments')
attachment = models.FileField(upload_to=get_attachment_path, max_length=255, null=False)
filesize = models.IntegerField(default=0)
filetype = models.CharField(blank=True, max_length=100)
orientation = models.IntegerField(default=0)
The get_attachment_path callable simply builds a path based on the document pk and the user.
When a user creates a document, he can attach files to it. As it's a modern world, you want to be able to upload the files before creating the document, so I have a TempUpload object and a direct connection from the web application to S3 (using pre-signed URL).
When the user clicks save on the document form, I get an array of all the TempUpload objects that I need to attach to the new document.
Now here's the problem... within the heroku 30s timeout constraint, I need to:
create the document (quite fast)
iterate the array of TempUpload objects, and create the Attachment
copy the file to its final destination (using boto3 copy_object)
get the filetype (using the magic library, I only query the first 128 bytes, but still.. one roundtrip to S3)
get the file size (another roundtrip to S3)
get the orientation (only if the attachment is an image, also based on the EXIF data and using streaming)
I've already moved the thumbnail generation to an external service. But with bigger files (standard camera pictures can easily be 6-10Mb big), I can have processing delay of almost 1 sec per file, meaning that if the user uploads more than 20 images, it's getting very very close to the heroku timeout...
I'm currently using celery and redis to move most of the processing outside the response lifecycle, but it's not really nice... it can happen that the document is requested before the async task complete. in which case some info are not available yet (size/orientation).
I think that should actually be a fairly standard feature, I'd be interested to know how do you implement this ?
Edit:
Just to add some strategies I'm thinking of:
get the info at the initial upload instead of when saving the document. This would work but the problem is that I don't know when the upload is completed (this happens directly between the web app and S3)
use lambda fonction ?
use S3 tags instead of moving the file to easily differentiate the "saved" files and the ones that should be deleted ?
For testing and dev I'd like to keep as much logic as possible within the django app.. but hey, maybe that's not possible...

How to mix Django, Uploadify, and S3Boto Storage Backend?

Background
I'm doing fairly big file uploads on Django. File size is generally 10MB-100MB.
I'm on Heroku and I've been hitting the request timeout of 30 seconds.
The Beginning
In order to get around the limit, Heroku's recommendation is to upload from the browser DIRECTLY to S3.
Amazon documents this by showing you how to write an HTML form to perform the upload.
Since I'm on Django, rather than write the HTML by hand, I'm using django-uploadify-s3 (example). This provides me with an SWF object, wrapped in JS, that performs the actual upload.
This part is working fine! Hooray!
The Problem
The problem is in tying that data back to my Django model in a sane way.
Right now the data comes back as a simple URL string, pointing to the file's location.
However, I was previously using S3 Boto from django-storages to manage all of my files as FileFields, backed by the delightful S3BotoStorageFile.
To reiterate, S3 Boto is working great in isolation, Uploadify is working great in isolation, the problem is in putting the two together.
My understanding is that the only way to populate the FileField is by providing both the filename AND the file content. When you're uploading files from the browser to Django, this is no problem, as Django has the file content in a buffer and can do whatever it likes with it. However, when doing direct-to-S3 uploads like me, Django only receives the file name and URL, not the binary data, so I can't properly populate the FieldFile.
Cry For Help
Anyone know a graceful way to use S3Boto's FileField in conjunction with direct-to-S3 uploading?
Else, what's the best way to manage an S3 file just based on its URL? Including setting expiration, key id, etc.
Many thanks!
Use a URLField.
I had a similar issue where i want to store file to s3 either directly using FileField or i have an option for the user to input the url directly. So to circumvent that, i used 2 fields in my model, one for FileField and one for URLField. And in the template i could use 'or' to see which one exists and to use that like {{ instance.filefield or instance.url }}.
This is untested, but you should be able to use:
from django.core.files.storage import default_storage
f = default_storage.open('name_you_expect_in_s3', 'r')
#f is an instance of S3BotoStorageFile, and can be assigned to a field
obj, created = YourObject.objects.get_or_create(**stuff_you_know)
obj.s3file_field = f
obj.save()
I think this should set up the local pointer to s3 and save it, without over writing the content.
ETA: You should do this only after the upload completes on S3 and you know the key in s3.
Checkout django-filetransfers. Looks like it plays nice with django-storages.
I've never used django, so ymmv :) but why not just write a single byte to populate the content? That way, you can still use FieldFile.
I'm thinking that writing actual SQL may be the easiest solution here. Alternatively you could subclass S3BotoStorage, override the _save method and allow for an optional kwarg of filepath which sidesteps all the other saving stuff and just returns the cleaned_name.

Upload image to Django admin, crop and scale, and send it to Amazon S3 without saving the file locally?

I want to allow users upload an image through the Django admin, crop and scale that image in memory (probably using PIL), and save it to Amazon S3 without saving the image on the local filesystem. I'll save the image path in my database, but that is the only aspect of the image that is saved locally. I'd like to integrate this special image upload widget into the normal model form on the admin edit page.
This question is similar, except the solution is not using the admin interface.
Is there a way that I can intercept the save action, do manipulations and saving of the image to S3, and then save the image path and the rest of the model data like normal? I have a pretty good idea of how I would crop and scale and save the image to S3 if I can just get access to the image data.
See https://docs.djangoproject.com/en/dev/topics/http/file-uploads/#changing-upload-handler-behavior
If images are smaller than a particular size, the will already be stored only in memory, so you can likely tune the FILE_UPLOAD_MAX_MEMORY_SIZE parameter to suit your needs. Additionally, you'll have to make sure that you don't access the .path field of these uploaded images, because that will cause them to be written out to a file. Instead, use (for example) the .read() method. I haven't tested this, but I believe this will work:
image = PIL.Image(request.FILES['my_file'])
Well if you don't want to touch the Admin part of Django then you can define scaling in the models save() method.
But when using the ImageField in Django. Django can actually do the saving for you. It has height and width options available.
https://docs.djangoproject.com/en/dev/ref/models/fields/#imagefield
For uploading to S3 I really suggest using django-storages backends from:
https://bitbucket.org/david/django-storages/src (preferably S3-boto version)
That way you basically will not have to write any code yourself. You can just use available libraries and solutions that people have tested.