Is there a way to modify the storage backend used by a Django FileField?

Is there a way to modify the storage backend used by a Django FileField? - django

I am saving some uploaded files with a Django FileField set to use DefaultStorage backend. At some point after the file has been uploaded I'd like to move them to a different storage backend i.e. change the FileField's storage attribute (obv. after saving the contents of the source file to the new storage location). Simply changing the FileField instances storage doesn't seem to work.
Is this possible without the use of a second FileField model attr which has been told to use a different storage backend? Ideally I'd like to not have to double up on the fields and put switches in all the templates that reference the files.
Thanks!

It seems that the storage associated with a FileField is not written to the database, so setting it on a particular field instance wouldn't persist. Instead it is read from the FileField instance associated with the model (so if you have file = models.FileField(..., storage=some_storage) it is reset to some_storage every time the models are setup by Django).

Related

Django File object and S3

So I have added s3 support to one of my Django projects. (storages and boto3)
I have a model that has a file field with zip-archive with images in it.
At some point I need to access this zip-archive and parse it to create instances of another model with those images from archive. It looks something like this:
I access archive data with zipfile
Get image from it
Put this image to django File object
Add this file object to model field
Save model
I works perfectly fine without s3, however with it I get UnsupportedOperation: seek error.
My guess is that boto3/storages does not support uploading files to s3 from memory files. Is it the case? If so, how to fix id/ avoid this in this kind of situation?

Django how to upload file directly to 3rd-part storage server, like Cloudinary, S3

Now, I have realized the uploading process is like that:
1. Generate the HTTP request object, and set the value to request.FILE by using uploadhandler.
2. In the views.py, the instance of FieldFile which is the mirror of FileField will call the storage.save() to upload file.
So, as you see, django always use the cache or disk to pass the data, if your file is too large, it will cost too much time.
And the design I want to figure this problem is to custom an uploadhandler which will call storage.save() by using input raw data. The only question is how can I modify the actions of FileField?
Thanks for any help.

you can use this package
Add direct uploads to AWS S3 functionality with a progress bar to file input fields.
https://github.com/bradleyg/django-s3direct

You can use one of the following packages
https://github.com/cloudinary/pycloudinary
http://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html

How to use Django ImageField, and why use it at all?

Up until now, I've been storing my image filenames in a CharField and saving the actual file directly to S3. This was a fine solution for my own usage. I'd like to reconsider using an ImageField, since now there will be other users and file input validation would be appropriate.
I have a couple of questions that weren't exactly answered after reading the docs and the source code for FileField (which appears to be essentially ImageField minus the Pillow check and dimension field updating functionality).
1) Why use an ImageField at all? Or rather, why use a FileField? Sure, it's convenient for quick-and-easy forms and convenient for inserting to Django templates. But are there any substantial reasons, eg. Is it evidently secured against exploits and malicious uploads?
2) How to write to the field file? If it is correct that the file can be read by instance.imagefield (or is it instance.imagefield.file?), if I want to write to it can I simply do the following?
#receiver(pre_save, sender=Image)
def pre_save_image(sender, instance, *args, **kwargs):
instance.imagefield = process_image(instance.imagefield)
3) How to try saving with a specific filename, then try again with a new filename if that randomly generated filename already exists? For example with my code right now I do this, how can it be done with ImageField? I want to do it at the model layer, because if I do repeated tries at the view layer then the pre_save processing would run again which is ghetto (even though it's unlikely that it'll have a second try ever in the lifetime of the service).
for i in range(tries):
try:
name = generate_random_name()
media_storage.save(name + '.jpg', ContentFile(final_bytes))
break
except:
pass
4) In the models.py pre_save and post_save signals and in the actual model's save(), how can I tell if a file came in with the request? i.e. I want to know if a new image is incoming to be saved, or if there is no image (some other field in the object is being updated and the image itself remains unchanged).

I don't see any advantage of FileField or ImageField over what you are doing today. In fact, as I see it, the proper/modern/scalable way to deal with uploads is to have the client (browser) upload files directly to S3.
If done correctly (from a security stand point), this scheme allows you to scale in an incredible way without the need to add more computer power on your side. As an example, consider 100 people uploading a picture at the same time. Your server will need to receive all these data, only to upload it again to S3. On the other side, you can have a 1000 people upload at the same time, and I can assure you AWS can handle it. Your server only needs to handle the signing of the URL, which is a lot less work.
Take a look at fine-uploader, as a good technology to use to handle the efficient upload to s3 (loading in chunks, error checking, etc): http://docs.fineuploader.com/endpoint_handlers/amazon-s3.html. Google "django fineuploader" to find a sample application for Django.
In my case, I use a Model with a couple CharFields (bucket, key) plus a few other things specific to my application. My data flow is as follows:
Django services a page with the fine-uploader widget, configured based on my settings.
Fineuploader requests a signed URL from the django server (endpoint), and uses that to upload to S3 directly.
When the upload is complete, fineUploader makes another request to my server to register the completion of the upload, at which time, I create my object on the database. In this case, if the upload fails, I never create an object on the database.
On the AWS side, S3 triggers a Lambda function, which I use to create a thumbnail, and store it back to S3. So, I don't even use my own CPU (e.g. Celery) for resizing. So you see, not only can I have thousands of users uploading at the same time, but I can resize those thousand pictures in parallel, and for less than what an EC2 worker will cost me.
My Django Model is also used as a wrapper to manage the business logic (e.g. functions like get_original_url() and get_thumbnail_url()), so after the uploads, it is easy for my templates to get the signed read-onlly URLs.
In short, you can implement your own version of Fineuploader if you want, or use many of the alternative, but assuming you follow the recommended security best practices on the AWS side (e.g. create a special IAM with only write permission for the client, even if you are using signed URLs), this, IMO, is the best practice for dealing with uploads, especially if you are using S3 or similar to store these files.
Sorry if I am only really answering question 1, but questions 2 and 3 don't apply if you accept my answer for 1.

1) Why use an ImageField at all? Or rather, why use a FileField?
It's convenient for quick-and-easy forms and convenient for inserting
to Django templates.
But are there any substantial reasons, eg. Is it evidently secured against exploits and malicious uploads?
Yes. I daresay your own code probably does it too, but for a newby using the FileField will probably ensure that your important system files are not getting overwritten by a malicious upload.
2) How to write to the field file?
In your situation you would need to use a special storage backend that makes it possible to write directly to the Amazon S3. As you know, the storage backend for FileFile and ImageField are plugable. Here is one example plugin: `http://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html
There is sample code which demonstrates how it can be written to. So I wll not go into that.`
3) How to try saving with a specific filename, then try again with a new filename if that randomly generated filename already exists?
ImageField and FileField takes care of this for you automatically. It will create a new filename if the old one exists. The code in my answer here did that automatically when I called it over and over again. here are some sample filenames produces (input being bada.png)
"4", "media/bada.png"
"5", "media/bada_aH0gV7t.png"
"7", "media/bada_XkzthgK.png"
"8", "media/bada_YzZuwDi.png"
"9", "media/bada_wpkasI3.png"
4) In the models.py pre_save and post_save signals and in the actual model's save(), how can I tell if a file came in with the request?
Your instance.pk will be None
If this is a modification to an existing file the PK will be set.
If this is a new image upload in the pre_save

Took me forever to learn how to save an image using ImageField. Turns out it's crazy easy -- once you know how to do it, it is, at least. I mean, it all comes together sensibly after you see it.
So basically, you're working with a FileField. I already looked into the differences between ImageField and FileField:
ImageField takes everything FileField takes in terms of attributes,
but ImageField also takes a width and height attribute if indicated.
ImageField, unlike FileField, validates an upload, making sure it's
an image.
Using ImageField comes down to most of the same constructs as FileField does. The biggest things to remember:
request.FILES['name_of_model']
So a form is generated from something in forms.py (or wherever your forms are) like this:
imgfile = forms.ImageField(label = 'Choose your image',
help_text = 'The image should be cool.')
In the model, you might have this in correspondence:
imgfile = models.ImageField(upload_to='images/%m/%d')
So there will be a POST request from the user (when the user completes the form). That request will contain basically a dictionary of data. The dictionary holds the submitted files. To focus the request on the file from the field (in our case, an ImageField), you would use:
request.FILES['imgfield']
You would use that when you construct the model object (instantiating your model class):
newPic = ImageModel(imgfile = request.FILES['imgfile'])
To save that the simple way, you'd just use the save() method bestowed upon your object (because Django is that awesome):
if form.is_valid():
newPic = Pic(imgfile = request.FILES['imgfile'])
newPic.save()
Your image will be stored, by default, to the directory you indicate for MEDIA_ROOT in settings.py.
The tough part, which isn't really so tough when you catch on, is accessing the image.
In your template, you could have something like this:
<img src="{{ MEDIA_URL }}{{ image.imgfile.name }}"></img>
Where {{ MEDIA_URL }} is something like /media/, as indicated in settings.py and {{ image.imgfile.name }} is the name of the file and the subdirectory you indicated in the model. "image" in this case is just the current image in a loop of images you might create to access each image in the database:
{% for image in images %}
{% endfor %}
Make SURE you configure your urls properly to handle the image or the image won't work. Add this to your urls:
urlpatterns += patterns('',
url(r'^media/(?P<path>.*)$', 'django.views.static.serve', {
'document_root': settings.MEDIA_ROOT,
}),
)

How to mix Django, Uploadify, and S3Boto Storage Backend?

Background
I'm doing fairly big file uploads on Django. File size is generally 10MB-100MB.
I'm on Heroku and I've been hitting the request timeout of 30 seconds.
The Beginning
In order to get around the limit, Heroku's recommendation is to upload from the browser DIRECTLY to S3.
Amazon documents this by showing you how to write an HTML form to perform the upload.
Since I'm on Django, rather than write the HTML by hand, I'm using django-uploadify-s3 (example). This provides me with an SWF object, wrapped in JS, that performs the actual upload.
This part is working fine! Hooray!
The Problem
The problem is in tying that data back to my Django model in a sane way.
Right now the data comes back as a simple URL string, pointing to the file's location.
However, I was previously using S3 Boto from django-storages to manage all of my files as FileFields, backed by the delightful S3BotoStorageFile.
To reiterate, S3 Boto is working great in isolation, Uploadify is working great in isolation, the problem is in putting the two together.
My understanding is that the only way to populate the FileField is by providing both the filename AND the file content. When you're uploading files from the browser to Django, this is no problem, as Django has the file content in a buffer and can do whatever it likes with it. However, when doing direct-to-S3 uploads like me, Django only receives the file name and URL, not the binary data, so I can't properly populate the FieldFile.
Cry For Help
Anyone know a graceful way to use S3Boto's FileField in conjunction with direct-to-S3 uploading?
Else, what's the best way to manage an S3 file just based on its URL? Including setting expiration, key id, etc.
Many thanks!

Use a URLField.

I had a similar issue where i want to store file to s3 either directly using FileField or i have an option for the user to input the url directly. So to circumvent that, i used 2 fields in my model, one for FileField and one for URLField. And in the template i could use 'or' to see which one exists and to use that like {{ instance.filefield or instance.url }}.

This is untested, but you should be able to use:
from django.core.files.storage import default_storage
f = default_storage.open('name_you_expect_in_s3', 'r')
#f is an instance of S3BotoStorageFile, and can be assigned to a field
obj, created = YourObject.objects.get_or_create(**stuff_you_know)
obj.s3file_field = f
obj.save()
I think this should set up the local pointer to s3 and save it, without over writing the content.
ETA: You should do this only after the upload completes on S3 and you know the key in s3.

Checkout django-filetransfers. Looks like it plays nice with django-storages.

I've never used django, so ymmv :) but why not just write a single byte to populate the content? That way, you can still use FieldFile.

I'm thinking that writing actual SQL may be the easiest solution here. Alternatively you could subclass S3BotoStorage, override the _save method and allow for an optional kwarg of filepath which sidesteps all the other saving stuff and just returns the cleaned_name.

How can I use South's DataMigration to change the storage backend of Django model ImageField instance?

I'm trying to migrate some models ImageFields to using the S3BotoStorage storage backend from django-storages. As part of this process I've changed my Model's ImageField declaration to include the storage=instance_of_s3botostorage argument, and new instances of my Model that save an image to the ImageField attribute now get stored in S3 - as intended.
I tried to move existing model instances over to storing their data in S3, too, so wrote a South DataMigration like this:
def forwards(self, orm):
"upload ImageField file to S3 if it's not already in there"
for mymodel in orm.MyModel.objects.all():
if mymodel.logo_image and not isinstance(mymodel.logo_image.storage, S3BotoStorage):
print "uploading %s to S3" % mymodel.logo_image
file_contents = ContentFile(mymodel.logo_image.read())
mymodel.logo_image.save(mymodel.logo_image.name, file_contents)
mymodel.save()
but this clearly doesn't have the intended effect because the image file is simply saved using the old storage backend - which makes sense considering save() is actually a method of the FieldFile belonging to the FileField
So, how to move/change file storage on an instance of a model?

So, turns out the particular storage used for files is not stored in the database. 'migration' is simply a matter of changing the Model definition then, outside of using the storage subsystem API, simply upload files into the new storage locations.

I would look at a system more like this for your problem. http://github.com/seanbrant/django-queued-storage

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js