Heroku & Django: Storing a single jpg asset - django

With Heroku's ephemeral filesystem, it's obvious we can't host user-uploaded images or dynamically changing ones in the app's filesystem because these won't be available to other dynos.
However, I'm looking for a solution to store a SINGLE image, used as sort of a logo, in a Django URLField() - this image's URL will be sort of a fallback if an image at a remote location (external service) is not available( Hence the URLField and not ImageField)
picture_url = models.URLField(null=True)
For me, setting-up and using services like Amazon S3 seems like an overkill. Using services like imgur/Photobucket seems unreliable.
Any workarounds?

Sadly, no.
There's no functional difference between storing a single image, or multiple images. The issue is that Heroku deploys a snapshot of your code, as represented in Git. If an asset doesn't live in your repository, it will be throw away whenever Heroku re-deploys your code (including when you push new code).
An image hosting service (storing the URL as you're already planning to do) is your best bet.

Related

Why we need to setup AWS and POSTgres db when we deploy our app using Heroku?

I'm building a web api by watching the youtube video below and until the AWS S3 bucket setup I understand everything fine. But he first deploy everything locally then after making sure everything works he is transferring all static files to AWS and for DB he switches from SQLdb3 to POSgres.
django portfolio
I still don't understand this part why we need to put our static files to AWS and create POSTgresql database even there is an SQLdb3 default database from django. I'm thinking that if I'm the only admin and just connecting my GitHub from Heroku should be enough and anytime I change something in the api just need to push those changes to github master and that should be it.
Why we need to use AWS to setup static file location and setup a rds (relational data base) and do the things from the beginning. Still not getting it!
Can anybody help to explain this ?
Thanks
Databases
There are several reasons a video guide would encourage you to switch from SQLite to a database server such as MySQL or PostgreSQL:
SQLite is great but doesn't scale well if you're expecting a lot of traffic
SQLite doesn't work if you want to distribute your app accross multiple servers. Going back to Heroky, if you serve your app with multiple Dynos, you'll have a problem because each Dyno will use a distinct SQLite database. If you edit something through the admin, it will happen on one of this databases, at random, leading to inconsistencies
Some Django features aren't available on SQLite
SQLite is the default database in Django because it works out of the box, and is extremely fast and easy to use in local/development environments for prototyping.
However, it is usually not suited for production websites. Additionally, while it can be tempting to store your sqlite.db file along with your code, for instance in a git repository, it is considered a bad practice because your database can contain sensitive data (such as passwords, usernames, emails, etc.). Hence, a strict separation between your code and data is a good practice.
Another way to put it is that your code and your data have different lifecycles. You want to be able to edit data in your database without redeploying your code, and update your code without touching your database.
Even if you can remove public access to some files through GitHub, this is not a good practice because when you work in a team with multiple developpers, developpers may have access to the code but not the production data, because it's usually sensitive. If you work with 5 people and each one of them has a copy of your database, it means the risk to lose it or have it stolen is 5x higher ;)
Static files
When you work locally, Django's built-in runserver command handles the serving of static assets such as CSS, Javascript and images for you.
However, this server is not designed for production use either. It works great in development, but will start to fail very fast on a production website, that should handle way more requests than your local version.
Because of that, you need to host these static files somewhere else, and AWS is one place where you can do that. AWS will serve those files for you, in a very efficient way. There are other options available, for instance configuring a reverse proxy with Nginx to serve the files for you, if you're using a dedicated server.
As far as I can tell, the progression you describe from the video is bringing you from a local, development enviromnent to a more efficient and scalable production setup. That is to be expected, because it's less daunting to start with something really simple (SQLite, Django's built-in runserver), and move on to more complex and abstract topics and tools later on.

Heroku doesn't update github file system when an image is uploaded from website

I ran into the problem where Heroku doesn't update my GitHub repository (or say static filesystem) when a blog post (including pictures) is created from the website.
Other images survive, whilst the ones saved in my filesystem with the server running on heroku, disapper.
I found this on their documentation.
The Heroku filesystem is ephemeral - that means that any changes to the filesystem whilst the dyno is running only last until that dyno is shut down or restarted.
I'm still confused why not all the pictures disappear and only those added later do.
Is AWS S3 a solution for this? If it is, how can I represent my filesystem using buckets?
Say, for the Blog Post 1 I have 2 picture resolutions, which means storing the files in different folders corresponding to those resolutions.
---1920x1920
-----picture.jpg
---800x800
-----picture.jpg
Does that mean I have to create 2 buckets named 1920x1920 and 800x800 or is there a better way of handling them?
Is AWS S3 a solution for this?
S3 is the recommended solution for this, and the configuration is documented in Heroku DevCentre with specfic instructions for uploading from Python.
Note these Python instructions use the Direct Upload approch: Have the flask app generate a pre-signed URL, which is then passed back to the client Javascript code, so that the user's browser can make the upload to S3 directly. The resulting S3 URL of the image, is then put into a hidden element in the form, which is then received by your app on form submit.
The fact that you have separate image sizes suggests your app does some processing (maybe with PIL) to get these thumbnails. In which case it may be easier to use the Pass-Through approach where your app implements its own upload mechanism, does the processing and then uploads the thumbnails to S3 (The upload to S3 part is well document, such as in this SO thread).
The Pass-Through method carries the warning that this may cause blocking of a single threaded worker. If your site gets a volume of requests that causes this to be an issue, you may need to increase the number of gunicorn workers, or change to a worker type that supports concurrency (This github post has some useful commands/info on concurrent worker types).
The best way to implement this whole thing (although the requirement for a redisgo dyno and worker dyno may push you into the paid teir) may be with Background Tasks using rq. You use the Direct-Upload approach above to upload the original image, then have a background job download that, do the resizing, and put the resulting thumbnails back onto S3.
Does that mean I have to create 2 buckets named 1920x1920 and 800x800 or is there a better way of handling them?
Have one Bucket for the entire app, and just include forward slashes in the object's key to mimic a subdirectory structure.

Saving images on files or on database when using docker

I'm using docker for a project, the main focus for its usage is to make the application available even if one of the node (it's a 6 nodes cluster with docker swarm) is down.
The application is basically a Django App that can save some images from users and others models. I'm currently saving the images as files, but since I need to specify a volume locally for a single machine, I would like to know if it would be better to save the images on database cluster, so it would be available even if the whole node goes down. Or is there another way?
#Edit
Note: The cluster runs locally and doesn't have internet access
The two options are two perform the file sharing via database or via the file system.
For file system sharing, you can use something like GlusterFS, so for each container it seems like they are mounting a host-local volume, but it's actually shared via GlusterFS between the hosts.
To my mind, if it's your application (e.g you can modify it at will), saving stuff in database would be the easier approach for most developers.
The best solution is often to go for a hosted option (such as MongoDB Atlas). Making a database resilient and highly available is really hard, and unless you are an expert on docker and mongo I would strongly recommend you to go for a hosted option.

Django + S3 (boto) + Sorl Thumbnail: Suggestions for optimisation

I am using S3 storage backend across a Django site I am developing, both to reduce load from the EC2 server(s), and to allow multiple webservers (redundancy, load balancing) access the same set of uploaded media.
Sorl.thumbnail (v11) template tags are being used in our templates to allow flexible image resizing/cropping.
Performance on media-rich pages is not very good, and when a page containing thumbnails needing to be generated for the first time is accessed, the requests even time out.
I understand that this is due to sorl thumbnail checking/downloading the original image from S3 (which could be quite large and high resolution), and rendering/checking/uploading the thumbnail.
What would you suggest is the best solution to this setup?
I have seen suggestions of storing a local copy of files in addition to the S3 copy (not to great when a couple of server are being used for load balancing). Also I've seen it suggested to store 0-byte files to fool sorl.thumbnail.
Are there any other suggestions or better ways of approaching this?
sorl thumbnail is now created with remote slow storages in mind. The first creation of the thumbnail is however done quering the storage, for example first accessed from template, but after that the references are cached in a key value store. Still you need the first query and creation, well one solution is to use the low level api sorl.thumbnail.get_thumbnail with the same options when the image is uploaded. When the image uploaded add this thumbnail creation job to a que like celery.
You can use Sorlery. It combines sorl and celery to create thumbnails via workers. It's very careful not to do any filesystem access outside of the worker thread.
The thumbnail returned immediately (before the worker has had a chance) can be controlled by setting your THUMBNAIL_DUMMY_SOURCE to an appropriate placeholder.
The job is created the first time the thumbnail is requested, subsequent requests are served the dummy image until the worker thread completes.
Almost same as #Aidan's solution, I have made some tweaks on sorl-thumbnail. I also pre-generate thumbnails with celery. My code is here sorl_thumbnail-async
But I came to know easy_thumbnails does exactly what I was trying to do, so I am using it in my current project. You might find useful, short post on the topic is here
The easiest solution I've found so far is actually this third party service: http://cloudinary.com/

How can I upload my files to my static server from a production server?

I have django and django's admin setup on my production box. This means that all file uploads are stored on the production box, all the media is stored there.
I have a separate server for files now ( different box, ip ). I want to upload my files there. What are the advantages and disadvantages of these methods I've thought about, and are there "better" methods you can suggest?
Setting up a script to do an rsync on the production box after a file is uploaded to the static server.
Setting up a permanent mount on the production box, by using a fileserver on the static media box such as nfs/samba/ssh-fs and then using the location of the mount as the upload_to path on the production box
Information: Both servers run debian.
EDIT: Prometheus from #django suggested http://docs.djangoproject.com/en/dev/howto/custom-file-storage/
I use Fabric. Especially of interest to you would be fabric.contrib.project.rsync_project().
To Paraphrase -
Fabric is a Python library and
command-line tool for streamlining the
use of SSH for Application Deployment
or systems administration tasks.
First use fabric.contrib.project.upload_project() to upload the entire project directory. From then on, bank on using fabric.contrib.project.rsync_project. to sync the project with local version. Also of special interest is that this uses unix rsync underneath & uses the secure scp to transfer .tar.gz data.
I guess this takes care of your needs. There might not be a need to setup a permanent mount etc.
If your static media is derived from your development process, then Fabric is the way to go. It can automate the deployment and replication of anything-- data, application files, even static database dumps-- from anywhere to anywhere.
If your static media is derived from your application server's operation-- generated PDFs and images, uploads from your users, compiled binaries (I had a customer who wanted that, a Django app that would take in raw x86 assembly and return a compiled and linked binary)-- then you want to use Django Storages, an app that abstracts the actual storage of content for any ImageField or FileField (or anything with a Python File-like interface). It has support for storing them in databases, to Amazon S3, via FTP, and a few others.