We used to use the following combination: Django framework with Heroku as the application server and Amazon S3 as the static file server.
But recently we need to build a system which handles a large amount of video data, with data transfer more than 10 TB per month. That means Amazon S3 is no longer an option because it's too expensive.
We opt to set up our own static file server, so it's gonna be Django, Heroku, and an on-premiss file server. We need some suggestions:
Is our decision good enough? Any other options?
Is Nginx a good choice for the file server in this application?
Are there good documentations about uploading large files from a Django+Heroku application to a Nginx server?
Thanks.
1) Yes, your decision is best possible one
2) Nginx is the very best solution. Cloudflare serves traffic with Nginx more than major web apps altogether. Netflix serves 33% all US media traffic with Nginx
3) S3 as an origin is not expensive but traffic costs a lot. That should help https://coderwall.com/p/rlguog/nginx-as-proxy-for-amazon-s3-public-private-files
Large files upload should bypass any kind of backend but saved on disk asynchronous followed by upload to any destination with s separate process. For big files upload you have be careful of authentication, normally authentication happens only after file is uploaded which can be dangerous. To solve that try https://coderwall.com/p/swgfvw/nginx-direct-file-upload-without-passing-them-through-backend
Related
Architecture Description
I have a Django app hosted on an Azure App Service container and proxy through Cloudflare's DNS.
The app works great, and using WhiteNoise I am able to serve static files stored in the Azure App Service storage container that is provided with the App Service (10 GB). Thing is, the storage serves files used by the Web App only (files uploaded during build, there's no option to manually add other files), and it is limited to 100GB/month of egress bandwidth.
I would like to try and use Cloudflare's R2 storage, as it has unlimited bandwidth, and allows you to upload any kind of files. I'll mainly be using images.
Question
How can static files be serve from Cloudflare's R2 on a Django app?
EDIT:
I have successfully connected to my Cloudflare's R2 bucket using [Boto3][1] but still can't link to the Django app on Azure.
I don't know how well R2 will function as a CDN from a cost/latency perspective, but as long as you stay within the free limits it's probably mostly fine (TTFB latency is going to be the biggest issue as with any object store).
We are working on making it possible to put Cache in front of R2 so that will help on the performance & cost front once that becomes available.
Unfortunately, I think SO is going to be a poor medium to debug your issue as it's unclear. Perhaps get realtime help on the R2 discord and then come back here to post the answer once you figure out your issue?
Since this sounds like tricky work, if you have a working example, we're happy to host it on https://developers.cloudflare.com/r2/examples/.
How can i get django to process media files on production when DEBUG = False on heroku server?
I know that it’s better not to do this and that this will lead to a loss of performance, but my application is used only by me and my cat, so I don't think that this will be unjustified in my case.
The reason this won't work is because the Heroku file system is ephemeral, meaning any files uploaded after your app code is pushed will be overwritten anytime your app is restarted. This will leave your app with image links in the DB which lead to non-existent files.
You can read more about it here:
https://help.heroku.com/K1PPS2WM/why-are-my-file-uploads-missing-deleted
Your best bet is using a bucket like Amazon S3 to upload your files to. It costs almost nothing for small use, and is very reliable.
https://blog.theodo.com/2019/07/aws-s3-upload-django/
I have developed a web application that uses (obviously) some static files, in order to deploy it, I've chosen to serve the files with the WSGI interpreter and use for the job gunicorn behind a firewall and a reverse proxy.
My application uses whitenoise to server staticfiles: Everything works fine and I don't have any issue regarding the performances...but, really, I can't understand WHY the practice to serve those static files using directly the WSGI interpreter it's discouraged (LINK), says:
This is not suitable for production use! For some common deployment strategies...
I mean, my service it's a collection of microservices: DB-Frontend-Services-Etc...If I need to scale them, I can do this without any problem and, in addition, using this philosophy, I'm not worried about the footprint of my microservices: for me, this seems logical, but maybe, for the rest of the world this is a completely out-of-mind strategy.
You've misinterpreted that documentation. It's fine to use Whitenoise to serve static files; that is entirely what it's for. What is not a good idea is to use that internal Django function to do so, since it is inefficient.
Three reasons why I personally serve static from CDN,
1- You are using up bandwidth from your app server and loosing time getting these static files instead of throwing the load to CDN to handle all that. (WhiteNoise should though eliminate that)
2- Some hosting services like AWS will charge you for extra traffic in/out, while you can use cheaper services like Cloudfront and a S3 bucket.
3- I like to keep my app servers for app purposes only, and utilize each service for its job only, this helps me in debugging and reducing my failure points.
On the other hand though, serving static from app server with something like WhiteNoise is much much easier than configuring your CDN.
Hope this helps!
It's quite ok when you use Whitenoise because:
Whitenoise is exactly made for this purpose and therefore efficient
It'll set the HTTP response headers correctly so clients cache the files.
But think of it this way: Instead of serving 1 or 2 requests per web page, you'll often get 10x more requests (usually web pages will request a bunch of images, one or more css files, a couple of js files...). Meaning you have to scale your application server to serve 10x more traffic on average than if you leave the job to a CDN.
By the way, I've written a tutorial on this topic which may help.
I am setting up a big ecommerce website having a million products onto Magento CE 1.9.3.4 (each product having 3-4 images on average)
My purpose is to ease mp VPS file system load by putting my /media/ folder onto S3 bucket. I have achieved this using this extension. https://github.com/thaiphan/magento-s3/
Problem
If the media files are not found on S3 (usually cache or may be some other), the request should be sent to Magento web server. But its not happening. Am i missing something in S3 configuration or in above extension?
I had also tried S3FS-FUSE, with this extension also, i faced the same problem with RSYNC. Due to slow rsync & amazon CLI even, it takes time to sync cache images.
Probable alternative i found but i don't understand:-
http://inchoo.net/magento/set-up-cdn-in-magento/
I just want to ask two simple queries:-
1) Am i missing any configuration to resolve this cache images issue? or any troubleshooting guidance will be really helpful
2) Do you think, using above alternative (cloudfront) will solve my purpose of easing the VPS load for images? Because what i know about cloudfront is, it is helpful for global websites, (my website is country specific, so going for S3)
If I host a small Django website on Heroku and I am using just one dyno, is it save to upload media files on that server, or should I necessarily use AWS S3 storage to store media files? What are other alternatives for media storage?
No, it is never safe to store things on the Heroku filesystem. Even though you only have one dyno, it is still ephemeral, and can be killed at any time; for example when you push new code.
Using S3 is the way to go (alternatives are the Azure and Google offerings). There are several other advantages for using S3, mostly ability to service files without stressing your small server.
While your site is small, a dyno is very small as well, so a major advantage of S3, if used correctly, is that you can have the backing of the AWS S3 infrastructure to service the files. By "used correctly", I mean that you want to upload and service files directly to/from S3 so your server is only used for signing the S3 urls, but the actual files never go through your server.
Check https://devcenter.heroku.com/articles/s3-upload-python and http://docs.fineuploader.com/quickstart/01-getting-started.html (I strongly recommend Fine-Uploader if you can use the free version or afford the small license fee.).
Obviously, you can also just implement S3 media files in django using django-storage-redux, but that that means your server will be busy uploading files. If that's ok for your small server, then it is ok too.