Using same static folder on multiple django sites - django

I have a website that consists of one main website and several subsites (and more are coming).
The thing is, the main site and the subsites has the same layout, uses the same js etc., so I want to ask if it's possible for all the sites to share a single static folder?
The static folder is 130 mb. atm. I find it kinda redundant that I need to copy that folder every time a new site is created. With 200 sites (a somewhat realistic goal), it would be 20 gb space wasted on duplicate files.
So is there a way to do this? I know it is somewhat against good django practice (no use of collectstatic)

In a situation like this, I would use Amazon S3 and CloudFront. You can transparently upload all of your static files to your S3 bucket using django-storages when you run collect static by replacing the default file upload mechanism with boto + s3 as such:
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
As #AdamKG stated, if all of these sites share the same code, with different content, you're probably better off using Django-CMS or moving these sites to database records rather than deploying the same code over and over.

AdamKG gave me the "right" answer - at least for my needs.
I might move to S3 at some point, when it's more relevant.
"Well, the easy hacks are symlinks & related. The question you should be asking, though, is why you're using django projects as a unit of what seems to be (going by count ~ 200) a commodity. IOW: why do the sites have separate anything, including static media, instead of just being rows in a table? "

Related

Why we need to setup AWS and POSTgres db when we deploy our app using Heroku?

I'm building a web api by watching the youtube video below and until the AWS S3 bucket setup I understand everything fine. But he first deploy everything locally then after making sure everything works he is transferring all static files to AWS and for DB he switches from SQLdb3 to POSgres.
django portfolio
I still don't understand this part why we need to put our static files to AWS and create POSTgresql database even there is an SQLdb3 default database from django. I'm thinking that if I'm the only admin and just connecting my GitHub from Heroku should be enough and anytime I change something in the api just need to push those changes to github master and that should be it.
Why we need to use AWS to setup static file location and setup a rds (relational data base) and do the things from the beginning. Still not getting it!
Can anybody help to explain this ?
Thanks
Databases
There are several reasons a video guide would encourage you to switch from SQLite to a database server such as MySQL or PostgreSQL:
SQLite is great but doesn't scale well if you're expecting a lot of traffic
SQLite doesn't work if you want to distribute your app accross multiple servers. Going back to Heroky, if you serve your app with multiple Dynos, you'll have a problem because each Dyno will use a distinct SQLite database. If you edit something through the admin, it will happen on one of this databases, at random, leading to inconsistencies
Some Django features aren't available on SQLite
SQLite is the default database in Django because it works out of the box, and is extremely fast and easy to use in local/development environments for prototyping.
However, it is usually not suited for production websites. Additionally, while it can be tempting to store your sqlite.db file along with your code, for instance in a git repository, it is considered a bad practice because your database can contain sensitive data (such as passwords, usernames, emails, etc.). Hence, a strict separation between your code and data is a good practice.
Another way to put it is that your code and your data have different lifecycles. You want to be able to edit data in your database without redeploying your code, and update your code without touching your database.
Even if you can remove public access to some files through GitHub, this is not a good practice because when you work in a team with multiple developpers, developpers may have access to the code but not the production data, because it's usually sensitive. If you work with 5 people and each one of them has a copy of your database, it means the risk to lose it or have it stolen is 5x higher ;)
Static files
When you work locally, Django's built-in runserver command handles the serving of static assets such as CSS, Javascript and images for you.
However, this server is not designed for production use either. It works great in development, but will start to fail very fast on a production website, that should handle way more requests than your local version.
Because of that, you need to host these static files somewhere else, and AWS is one place where you can do that. AWS will serve those files for you, in a very efficient way. There are other options available, for instance configuring a reverse proxy with Nginx to serve the files for you, if you're using a dedicated server.
As far as I can tell, the progression you describe from the video is bringing you from a local, development enviromnent to a more efficient and scalable production setup. That is to be expected, because it's less daunting to start with something really simple (SQLite, Django's built-in runserver), and move on to more complex and abstract topics and tools later on.

How can i get django to process media files on production?

How can i get django to process media files on production when DEBUG = False on heroku server?
I know that it’s better not to do this and that this will lead to a loss of performance, but my application is used only by me and my cat, so I don't think that this will be unjustified in my case.
The reason this won't work is because the Heroku file system is ephemeral, meaning any files uploaded after your app code is pushed will be overwritten anytime your app is restarted. This will leave your app with image links in the DB which lead to non-existent files.
You can read more about it here:
https://help.heroku.com/K1PPS2WM/why-are-my-file-uploads-missing-deleted
Your best bet is using a bucket like Amazon S3 to upload your files to. It costs almost nothing for small use, and is very reliable.
https://blog.theodo.com/2019/07/aws-s3-upload-django/

Best way to public download a full folder with amazon?

I'm doing a launcher (in C#) that downloads a full game or app. The app can be very large (i.e. 5GB) and I need to get it with the correct folder hierarchhy, so the same launcher can check if the user has the correct app or it needs to be repaired or updated.
I'm trying to do that with amazon s3 and CloudFront, but seems that I can only get objects and not the full folder of the app.
I also have stored the folder in an EC2, and that works fine, but seems that EC2 is not designed for that, so downloads are extremely slow.
Is there any amazon service to do that?
Have you considered zipping the files first? It solves alot of issues eg folder structure, compression and works great from s3 and cloud front. Its a common solution for this use case.
You can do this in your application with the DownlodDirectory method in TransferUtility class in the .NET SDK.
You can read more about the DownloadDirectory method here. By default I believe it only downloads objects in the root path, so don’t forget to do it recursively for sub-folders if necessary.

Why serving static file in production, using django it's discouraged?

I have developed a web application that uses (obviously) some static files, in order to deploy it, I've chosen to serve the files with the WSGI interpreter and use for the job gunicorn behind a firewall and a reverse proxy.
My application uses whitenoise to server staticfiles: Everything works fine and I don't have any issue regarding the performances...but, really, I can't understand WHY the practice to serve those static files using directly the WSGI interpreter it's discouraged (LINK), says:
This is not suitable for production use! For some common deployment strategies...
I mean, my service it's a collection of microservices: DB-Frontend-Services-Etc...If I need to scale them, I can do this without any problem and, in addition, using this philosophy, I'm not worried about the footprint of my microservices: for me, this seems logical, but maybe, for the rest of the world this is a completely out-of-mind strategy.
You've misinterpreted that documentation. It's fine to use Whitenoise to serve static files; that is entirely what it's for. What is not a good idea is to use that internal Django function to do so, since it is inefficient.
Three reasons why I personally serve static from CDN,
1- You are using up bandwidth from your app server and loosing time getting these static files instead of throwing the load to CDN to handle all that. (WhiteNoise should though eliminate that)
2- Some hosting services like AWS will charge you for extra traffic in/out, while you can use cheaper services like Cloudfront and a S3 bucket.
3- I like to keep my app servers for app purposes only, and utilize each service for its job only, this helps me in debugging and reducing my failure points.
On the other hand though, serving static from app server with something like WhiteNoise is much much easier than configuring your CDN.
Hope this helps!
It's quite ok when you use Whitenoise because:
Whitenoise is exactly made for this purpose and therefore efficient
It'll set the HTTP response headers correctly so clients cache the files.
But think of it this way: Instead of serving 1 or 2 requests per web page, you'll often get 10x more requests (usually web pages will request a bunch of images, one or more css files, a couple of js files...). Meaning you have to scale your application server to serve 10x more traffic on average than if you leave the job to a CDN.
By the way, I've written a tutorial on this topic which may help.

Django + S3 (boto) + Sorl Thumbnail: Suggestions for optimisation

I am using S3 storage backend across a Django site I am developing, both to reduce load from the EC2 server(s), and to allow multiple webservers (redundancy, load balancing) access the same set of uploaded media.
Sorl.thumbnail (v11) template tags are being used in our templates to allow flexible image resizing/cropping.
Performance on media-rich pages is not very good, and when a page containing thumbnails needing to be generated for the first time is accessed, the requests even time out.
I understand that this is due to sorl thumbnail checking/downloading the original image from S3 (which could be quite large and high resolution), and rendering/checking/uploading the thumbnail.
What would you suggest is the best solution to this setup?
I have seen suggestions of storing a local copy of files in addition to the S3 copy (not to great when a couple of server are being used for load balancing). Also I've seen it suggested to store 0-byte files to fool sorl.thumbnail.
Are there any other suggestions or better ways of approaching this?
sorl thumbnail is now created with remote slow storages in mind. The first creation of the thumbnail is however done quering the storage, for example first accessed from template, but after that the references are cached in a key value store. Still you need the first query and creation, well one solution is to use the low level api sorl.thumbnail.get_thumbnail with the same options when the image is uploaded. When the image uploaded add this thumbnail creation job to a que like celery.
You can use Sorlery. It combines sorl and celery to create thumbnails via workers. It's very careful not to do any filesystem access outside of the worker thread.
The thumbnail returned immediately (before the worker has had a chance) can be controlled by setting your THUMBNAIL_DUMMY_SOURCE to an appropriate placeholder.
The job is created the first time the thumbnail is requested, subsequent requests are served the dummy image until the worker thread completes.
Almost same as #Aidan's solution, I have made some tweaks on sorl-thumbnail. I also pre-generate thumbnails with celery. My code is here sorl_thumbnail-async
But I came to know easy_thumbnails does exactly what I was trying to do, so I am using it in my current project. You might find useful, short post on the topic is here
The easiest solution I've found so far is actually this third party service: http://cloudinary.com/