If I host a small Django website on Heroku and I am using just one dyno, is it save to upload media files on that server, or should I necessarily use AWS S3 storage to store media files? What are other alternatives for media storage?
No, it is never safe to store things on the Heroku filesystem. Even though you only have one dyno, it is still ephemeral, and can be killed at any time; for example when you push new code.
Using S3 is the way to go (alternatives are the Azure and Google offerings). There are several other advantages for using S3, mostly ability to service files without stressing your small server.
While your site is small, a dyno is very small as well, so a major advantage of S3, if used correctly, is that you can have the backing of the AWS S3 infrastructure to service the files. By "used correctly", I mean that you want to upload and service files directly to/from S3 so your server is only used for signing the S3 urls, but the actual files never go through your server.
Check https://devcenter.heroku.com/articles/s3-upload-python and http://docs.fineuploader.com/quickstart/01-getting-started.html (I strongly recommend Fine-Uploader if you can use the free version or afford the small license fee.).
Obviously, you can also just implement S3 media files in django using django-storage-redux, but that that means your server will be busy uploading files. If that's ok for your small server, then it is ok too.
Related
I ran into the problem where Heroku doesn't update my GitHub repository (or say static filesystem) when a blog post (including pictures) is created from the website.
Other images survive, whilst the ones saved in my filesystem with the server running on heroku, disapper.
I found this on their documentation.
The Heroku filesystem is ephemeral - that means that any changes to the filesystem whilst the dyno is running only last until that dyno is shut down or restarted.
I'm still confused why not all the pictures disappear and only those added later do.
Is AWS S3 a solution for this? If it is, how can I represent my filesystem using buckets?
Say, for the Blog Post 1 I have 2 picture resolutions, which means storing the files in different folders corresponding to those resolutions.
---1920x1920
-----picture.jpg
---800x800
-----picture.jpg
Does that mean I have to create 2 buckets named 1920x1920 and 800x800 or is there a better way of handling them?
Is AWS S3 a solution for this?
S3 is the recommended solution for this, and the configuration is documented in Heroku DevCentre with specfic instructions for uploading from Python.
Note these Python instructions use the Direct Upload approch: Have the flask app generate a pre-signed URL, which is then passed back to the client Javascript code, so that the user's browser can make the upload to S3 directly. The resulting S3 URL of the image, is then put into a hidden element in the form, which is then received by your app on form submit.
The fact that you have separate image sizes suggests your app does some processing (maybe with PIL) to get these thumbnails. In which case it may be easier to use the Pass-Through approach where your app implements its own upload mechanism, does the processing and then uploads the thumbnails to S3 (The upload to S3 part is well document, such as in this SO thread).
The Pass-Through method carries the warning that this may cause blocking of a single threaded worker. If your site gets a volume of requests that causes this to be an issue, you may need to increase the number of gunicorn workers, or change to a worker type that supports concurrency (This github post has some useful commands/info on concurrent worker types).
The best way to implement this whole thing (although the requirement for a redisgo dyno and worker dyno may push you into the paid teir) may be with Background Tasks using rq. You use the Direct-Upload approach above to upload the original image, then have a background job download that, do the resizing, and put the resulting thumbnails back onto S3.
Does that mean I have to create 2 buckets named 1920x1920 and 800x800 or is there a better way of handling them?
Have one Bucket for the entire app, and just include forward slashes in the object's key to mimic a subdirectory structure.
How can i get django to process media files on production when DEBUG = False on heroku server?
I know that it’s better not to do this and that this will lead to a loss of performance, but my application is used only by me and my cat, so I don't think that this will be unjustified in my case.
The reason this won't work is because the Heroku file system is ephemeral, meaning any files uploaded after your app code is pushed will be overwritten anytime your app is restarted. This will leave your app with image links in the DB which lead to non-existent files.
You can read more about it here:
https://help.heroku.com/K1PPS2WM/why-are-my-file-uploads-missing-deleted
Your best bet is using a bucket like Amazon S3 to upload your files to. It costs almost nothing for small use, and is very reliable.
https://blog.theodo.com/2019/07/aws-s3-upload-django/
We used to use the following combination: Django framework with Heroku as the application server and Amazon S3 as the static file server.
But recently we need to build a system which handles a large amount of video data, with data transfer more than 10 TB per month. That means Amazon S3 is no longer an option because it's too expensive.
We opt to set up our own static file server, so it's gonna be Django, Heroku, and an on-premiss file server. We need some suggestions:
Is our decision good enough? Any other options?
Is Nginx a good choice for the file server in this application?
Are there good documentations about uploading large files from a Django+Heroku application to a Nginx server?
Thanks.
1) Yes, your decision is best possible one
2) Nginx is the very best solution. Cloudflare serves traffic with Nginx more than major web apps altogether. Netflix serves 33% all US media traffic with Nginx
3) S3 as an origin is not expensive but traffic costs a lot. That should help https://coderwall.com/p/rlguog/nginx-as-proxy-for-amazon-s3-public-private-files
Large files upload should bypass any kind of backend but saved on disk asynchronous followed by upload to any destination with s separate process. For big files upload you have be careful of authentication, normally authentication happens only after file is uploaded which can be dangerous. To solve that try https://coderwall.com/p/swgfvw/nginx-direct-file-upload-without-passing-them-through-backend
We have developed a Facebook application that runs on Heroku. The application generates encrypted text that needs to be stored quickly. Currently, the text is simply written to a text file on the Heroku server, and this is not a scalable solution.
The data will eventually be downloaded to our local machines, but it is essential to have a reliable intermediate storage between the app and the local machine due to the inability of downloading rapidly at our end.
Would you recommend S3 for this purpose? Any alternatives?
+1 to S3. This is because Heroku is a read only file system, so you probably have to source for some third party solution.
Yup, i would recommend the s3. Very reliable.
Worth noting S3 offers server side encryption. In fact, the aws-s3 SDK ruby gem has built in support for client side encryption as well.
I have django and django's admin setup on my production box. This means that all file uploads are stored on the production box, all the media is stored there.
I have a separate server for files now ( different box, ip ). I want to upload my files there. What are the advantages and disadvantages of these methods I've thought about, and are there "better" methods you can suggest?
Setting up a script to do an rsync on the production box after a file is uploaded to the static server.
Setting up a permanent mount on the production box, by using a fileserver on the static media box such as nfs/samba/ssh-fs and then using the location of the mount as the upload_to path on the production box
Information: Both servers run debian.
EDIT: Prometheus from #django suggested http://docs.djangoproject.com/en/dev/howto/custom-file-storage/
I use Fabric. Especially of interest to you would be fabric.contrib.project.rsync_project().
To Paraphrase -
Fabric is a Python library and
command-line tool for streamlining the
use of SSH for Application Deployment
or systems administration tasks.
First use fabric.contrib.project.upload_project() to upload the entire project directory. From then on, bank on using fabric.contrib.project.rsync_project. to sync the project with local version. Also of special interest is that this uses unix rsync underneath & uses the secure scp to transfer .tar.gz data.
I guess this takes care of your needs. There might not be a need to setup a permanent mount etc.
If your static media is derived from your development process, then Fabric is the way to go. It can automate the deployment and replication of anything-- data, application files, even static database dumps-- from anywhere to anywhere.
If your static media is derived from your application server's operation-- generated PDFs and images, uploads from your users, compiled binaries (I had a customer who wanted that, a Django app that would take in raw x86 assembly and return a compiled and linked binary)-- then you want to use Django Storages, an app that abstracts the actual storage of content for any ImageField or FileField (or anything with a Python File-like interface). It has support for storing them in databases, to Amazon S3, via FTP, and a few others.