Saving images on files or on database when using docker - django

I'm using docker for a project, the main focus for its usage is to make the application available even if one of the node (it's a 6 nodes cluster with docker swarm) is down.
The application is basically a Django App that can save some images from users and others models. I'm currently saving the images as files, but since I need to specify a volume locally for a single machine, I would like to know if it would be better to save the images on database cluster, so it would be available even if the whole node goes down. Or is there another way?
#Edit
Note: The cluster runs locally and doesn't have internet access

The two options are two perform the file sharing via database or via the file system.
For file system sharing, you can use something like GlusterFS, so for each container it seems like they are mounting a host-local volume, but it's actually shared via GlusterFS between the hosts.
To my mind, if it's your application (e.g you can modify it at will), saving stuff in database would be the easier approach for most developers.

The best solution is often to go for a hosted option (such as MongoDB Atlas). Making a database resilient and highly available is really hard, and unless you are an expert on docker and mongo I would strongly recommend you to go for a hosted option.

Related

Why we need to setup AWS and POSTgres db when we deploy our app using Heroku?

I'm building a web api by watching the youtube video below and until the AWS S3 bucket setup I understand everything fine. But he first deploy everything locally then after making sure everything works he is transferring all static files to AWS and for DB he switches from SQLdb3 to POSgres.
django portfolio
I still don't understand this part why we need to put our static files to AWS and create POSTgresql database even there is an SQLdb3 default database from django. I'm thinking that if I'm the only admin and just connecting my GitHub from Heroku should be enough and anytime I change something in the api just need to push those changes to github master and that should be it.
Why we need to use AWS to setup static file location and setup a rds (relational data base) and do the things from the beginning. Still not getting it!
Can anybody help to explain this ?
Thanks
Databases
There are several reasons a video guide would encourage you to switch from SQLite to a database server such as MySQL or PostgreSQL:
SQLite is great but doesn't scale well if you're expecting a lot of traffic
SQLite doesn't work if you want to distribute your app accross multiple servers. Going back to Heroky, if you serve your app with multiple Dynos, you'll have a problem because each Dyno will use a distinct SQLite database. If you edit something through the admin, it will happen on one of this databases, at random, leading to inconsistencies
Some Django features aren't available on SQLite
SQLite is the default database in Django because it works out of the box, and is extremely fast and easy to use in local/development environments for prototyping.
However, it is usually not suited for production websites. Additionally, while it can be tempting to store your sqlite.db file along with your code, for instance in a git repository, it is considered a bad practice because your database can contain sensitive data (such as passwords, usernames, emails, etc.). Hence, a strict separation between your code and data is a good practice.
Another way to put it is that your code and your data have different lifecycles. You want to be able to edit data in your database without redeploying your code, and update your code without touching your database.
Even if you can remove public access to some files through GitHub, this is not a good practice because when you work in a team with multiple developpers, developpers may have access to the code but not the production data, because it's usually sensitive. If you work with 5 people and each one of them has a copy of your database, it means the risk to lose it or have it stolen is 5x higher ;)
Static files
When you work locally, Django's built-in runserver command handles the serving of static assets such as CSS, Javascript and images for you.
However, this server is not designed for production use either. It works great in development, but will start to fail very fast on a production website, that should handle way more requests than your local version.
Because of that, you need to host these static files somewhere else, and AWS is one place where you can do that. AWS will serve those files for you, in a very efficient way. There are other options available, for instance configuring a reverse proxy with Nginx to serve the files for you, if you're using a dedicated server.
As far as I can tell, the progression you describe from the video is bringing you from a local, development enviromnent to a more efficient and scalable production setup. That is to be expected, because it's less daunting to start with something really simple (SQLite, Django's built-in runserver), and move on to more complex and abstract topics and tools later on.

How Docker and Ansible fit together to implement Continuous Delivery/Continuous Deployment

I'm new to the configuration management and deployment tools. I have to implement a Continuous Delivery/Continuous Deployment tool for one of the most interesting projects I've ever put my hands on.
First of all, individually, I'm comfortable with AWS, I know what Ansible is, the logic behind it and its purpose. I do not have same level of understanding of Docker but I got the idea. I went through a lot of Internet resources, but I can't get the the big picture.
What I've been struggling is how they fit together. Using Ansible, I can manage my Infrastructure as Code; building EC2 instances, installing packages... I can even deploy a full application by pulling its code, modify config files and start web server. Docker is, itself, a tool that packages an application and ensures that it can be run wherever you deploy it.
My problems are:
How does Docker (or Ansible and Docker) extend the Continuous Integration process!?
Suppose we have a source code repository, the team members finish working on a feature and they push their work. Jenkins detects this, runs all the acceptance/unit/integration test suites and if they all passed, it declares it as a stable build. How Docker fits here? I mean when the team pushes their work, does Jenkins have to pull the Docker file source coded within the app, build the image of the application, start the container and run all the tests against it or it runs the tests the classic way and if all is good then it builds the Docker image from the Docker file and saves it in a private place?
Should Jenkins tag the final image using x.y.z for example!?
Docker containers configuration :
Suppose we have an image built by Jenkins stored somewhere, how to handle deploying the same image into different environments, and even, different configurations parameters ( Vhosts config, DB hosts, Queues URLs, S3 endpoints, etc...) What is the most flexible way to deal with this issue without breaking Docker principles? Are these configurations backed in the image when it gets build or when the container based on it is started, if so how are they injected?
Ansible and Docker:
Ansible provides a Docker module to manage Docker containers. Assuming I solved the problems mentioned above, when I want to deploy a new version x.t.z of my app, I tell Ansible to pull that image from where it was stored on, start the app container, so how to inject the configuration settings!? Does Ansible have to log in the Docker image, before it's running ( this sounds insane to me ) and use its Jinja2 templates the same way with a classic host!? If not, how is this handled?!
Excuse me if it was a long question or if I misspelled something, but this is my thinking out loud. I'm blocked for the past two weeks and I can't figure out the correct workflow. I want this to be a reference for future readers.
Please, it would very helpful to read your experiences and solutions because this looks like a common workflow.
I would like to answer in parts
How does Docker (or Ansible and Docker) extend the Continuous Integration process!?
Since docker images same everywhere, you use your docker images as if they are production images. Therefore, when somebody committed a code, you build your docker image. You run tests against it. When all tests pass, you tag that image accordingly. Since docker is fast, this is a feasible workflow.
Also docker changes are incremental; therefore, your images will have minimal impact on storage. Also when your tests fail, you may also choose to save that image too. In this way, developer will pull that image and investigate easily why your tests failed. Developer may choose to run tests in their machine too since docker images in jenkins and their machine are not different.
What this brings that all developers will have same environment, same version of all software since you decide which one will be used in docker images. I have come across to bugs that are due to differences between developer machines. For example in the same operating system, unicode settings may affect your code. But in docker images all developers will test against same settings, same version software.
Docker containers configuration :
If you are using a private repository, and you should use one, then configuration changes will not affect hard disk space much. Therefore except security configurations, such as db passwords, you can apply configuration changes to docker images(Baking the Configuration into the Container). Then you can use ansible to apply not-stored configurations to deployed images before/after startup using environment variables or Docker Volumes.
https://dantehranian.wordpress.com/2015/03/25/how-should-i-get-application-configuration-into-my-docker-containers/
Does Ansible have to log in the Docker image, before it's running (
this sounds insane to me ) and use its Jinja2 templates the same way
with a classic host!? If not, how is this handled?!
No, ansible will not log in the Docker image, but ansible with Jinja2 templates can be used to change dockerfile. You can change dockerfile with templates and can inject your configuration to different files. Tag your files accordingly and you have configured images to spin up.
Regarding your question about handling multiple environment configurations using the same Docker image, I have been planning on using a Service Discovery tool like Consul as a centralized config/property management tool. So, when you start your container up, you set an ENV var that tells it what application it is (appID), and what environment config it should use (ex: MyApplication:Dev) and it will pull its config from Consul at startup. I still have to investigate the security around Consul (as if we are storing DB connection credentials in there for example, how do we restrict who can query/update those values). I don't want to just use this for containers, but all apps in general. Another cool capability is to change the config value in Consul and have a hook back into your app to apply the changes immediately (maybe like a REST endpoint on your app to push changes down to and dynamically apply it). Of course your app has to be written to support this!
You might be interested in checking out Martin Fowler's blog articles on immutable infrastructure and on Phoenix servers.
Although not a complete solution, I have suggestions for two of your issues. Although they might not be perfect, these are the practices we are using in our workflow, and prove themselves so far.
Defining different environments - supposing you've written a different Ansible role for each environment you launch, we define an environment variable setting the environment we wish the container to belong to. We then download the suitable configuration file from an S3 bucket using the env variable set before into the container (which should be possible if you supply AWS creds or give your server an IAM role) and inject these parameters into the code when building it.
Ansible doesn't need to log into the docker app, but the solution is a bit tricky. I've tried two ways of tackling this problem, and both aren't ideal. The first one is to download the configuration file as part of the docker image command line, and build the app on container startup. While this solution works - it breaches the Docker philosophy and makes the image highly prone to build errors.
Another solution is pushing several images to your docker hub repo, and then pulling the appropriate image according to the environment at hand.
In a broader stroke, I've tried launching our app completely with Ansible and it was hell, many configuration steps are tricky and get trickier when you try to implement them as a playbook. When I switched to maintaining the severs alone with Ansible, and deploying the app itself with Docker things got a lot easier.

Adding files for application to use on Cloud Foundry

For an application I'm converting to the Cloud Foundry platform, I have a couple of template files. These are basically templates for documents that will be converted to PDF's. What are my options when it comes to having these available to my application? There are no persistent system drives, so I can't just upload them, it seems. Cloud Foundry suggests for you to save them on something like Amazon S3, Dropbox or Box, or simply having them in a database as blobs, but this seems like a very curious work-around.
These templates will change separately from application files, so I'm not intending to have them in the application Jar.
Cloud Foundry suggests for you to save them on something like Amazon S3, Dropbox or Box, or simply having them in a database as blobs, but this seems like a very curious work-around.
Why do you consider this a curious work-around?
One of the primary benefits of Cloud Foundry is elastic scalability. Once your app is running on CF, you can easily scale it up and down on demand. As you scale up, new copies of the app are started in fresh containers. As you scale down, app containers are destroyed. Only the files that are part of the original app push are put into a fresh container.
If you have files like these templates that are changing over time and you store them in the container's file system, you would need to make sure that all instances of the app have the same copies of the template file at all times as you scale up and down. When you upload new templates, you would have to make sure they get distributed to all instances, not just the one instance processing the upload. As new app instances are created on scale-up, you would need to make sure they have the latest versions of the templates.
Another benefit of CF is app health management. If an instance of your app crashes for any reason, CF will detect this and start a new instance in a fresh container. Again, only files that were part of the original app push will be dropped into the fresh container. You would need to make sure that the latest version of the template files got added to the container after startup.
Storing files like this that have a lifecycle separate from the app in a persistence store outside of the app container solves all these problems, and ensures that all instances of the app get the same versions of the files as you scale instances up and down or as crashed instances are resurrected.

Deploying WordPress on Elastic Beanstalk?

Suppose I create a site in Wordpress, which is running on Elastic Beanstalk. Now, on the running app I will create posts /pages, upload images, etc. That is, some data, videos, files and records in a database will be added to the running application.
3 questions:
If WordPress is running on Elastic Beanstalk with multiple Amazon EC2 instances actually running my WordPress install, then will those files propagate automatically to all running instances? And will this also happen, if a new EC2 instance is fired up - for example, to handle increased load?
From what I see in AWS console, I can deploy different versions of an app-- but as per scenario above, if I deploy a new version, wont I lose all the files uploaded directly into running app (i.e. files and database records)? How do I keep those and at the same time deploy a new version of my app?
The WordPress team keeps issuing upgrades. Can I directly upgrade my running WordPress install, through the web interface? Or do I have to first upgrade my local version of WordPress, and then upload the new version of the app to Beanstalk? If I have to upgrade my local version and then upload, then again I am back to point 1, i.e. changes made by users directly to the older version of running app. How do I preserve those changes?
I've been working on this as well, and have learned a couple of things that are relevant here -- your question about uploads in particular has been on my mind:
(1) The best way to handle uploads, it seems to me, is to either go the NFS/NAS route like you suggest, but one better than that is to use an Amazon S3 plugin for WordPress, so that any uploads automatically copy up to S3 and the URLs in your WordPress media library reflect the FQDN of your bucket and not your specific site. That way you could have one or ten WP nodes in your Beanstalk and media/images are independent of any one of those servers.
(2) You should absolutely be using RDS here. Few things are easier to work with and as stress-free as a Multi-AZ, reserved MySQL RDS instance. Either that or your own EC2 running MySQL that is independent of the Beanstalk, but why run that when RDS is so much easier?
(3) Yes you definitely have to commit changes to your Git repository or local file first (new plugins, changes to themes, WP upgrades) and then upload/install as a revision to the Beanstalk code. Otherwise, all the changes you make via the web interface to one node will never be in the new load for a new node -- in fact you'll have an upgraded database but an older set of code in the Beanstalk application, so it's likely to create errors of some kind or another.
I took an AWS architecture course, and their advice for EC2 and the Beanstalk is to start to think about server instances as very disposable -- so you should try to think about easy ways for your boxes to provision themselves in the bootstrapping process and to take over work for one another without any precious resources on just one box. So losing an instance should never be a big deal. (This is definitely not how we thought in the world of physical servers, where we got everything tweaked 'just so'.)
Good luck!
Well, I'm no expert, but since no one has answered, I'll give it my best shot.
You are absolutely right--kind of. While each EC2 instance does have some local storage, it is destroyed and reset with each new instance. Because of this, Amazon has things like Elastic Block Storage and S3 for persistent files. I don't know how one would configure WP to utilize this, but that will likely be the solution.
I think this problem is solved by my answer to #1. As for the database, all of your EC2 instances should be pulling from the same RDS location. Again, while you could have MySQL running on each EC2 instance, in the interest of persistence, having a separate database makes more sense.
You, again, have most everything right. Local development should always precede live deployment. Upgrading locally then pushing to the live servers will make sure all of your instances remain identical.
Truth-be-told, I am still in the process of learning all of this too, as I said, I'm not an expert. Hopefully someone else will come along and give a more informed answer. However, the key conceptual hurdle here is the idea of elastic scalability--and the salient point of this idea is the separation of elements between what is elastic/scalable/disposable and what is persistent.
Hopefully that helps.
I have deployed a small Wordpress site on EB, S3 and RDS. S3 holds all static data, such as media uploads. This works through a plugin. RDS holds the database. EB holds the latest deployed application. The application is deployed from my dev environment, with a build script. This way, I just have to press one button and I redeploy.
I wrote an article about it here: http://www.cortexcode.com/wordpress-to-aws-code-example/
While it was at first annoying to work with, the speed of AWS is nice and now it's easier than ever. It used to be that I had to upload a bunch of files over FTP, this is way more efficient. :-)
As an addition to all the great answers already:
1) I can highly recommend EFS but also S3 for media files, so they are served from high availability regions in combination with cloudfront. For Wordpress there is one plugin that really speeds up this ( not affiliated to them, just really like the plugin ). There is also an assets plugin, if you'd like to serve JS, CSS files from S3. For the EFS solution, take a look at the AWSlabs docs on git, and specifically this file on how they mount the uploads file.
In general, EBS is really great for Wordpress, but you'll need to think in a different mindset as compared to other hosting solutions ( shared hosting, managed hosting ).
OK I researched a lot on this particular issue, and this is what I learned--
(1) If a wordpress user uploads some files, then his files will be uploaded only to the virtual machine that is actually serving his request at that time. Eg if currently the wordpress site is cloud-deployed and is using 5 virtual machines, now when user makes request he is directed to one virtual machine-- the one with the lowest load at that point... His uploads are stored only on that server. Current Platform-as-a-service solutions (like Amazon Elastic Beanstalk and App Fog) do not have the ability to propagate the changes to all the running instances. Either that (propagate changes to all servers) or use a common storage by all running instances-- these are the only 2 solutions to this problem. (Eg of common storage would be all 5 running virtual machines using Network-Attached-Storage (NAS)... )
(2) With ref to platforms available currently like Amazon Elastic Beanstalk and App Fog, for example, even if user made changes directly to running app- these platforms rely on the local version of code (which the admin deployed initially to cloud)- and there is no way to update the local version of code (on admin's PC) with the changes made by a user to running app-- hence these changes viz, files are lost-- Similarly, changes in database by user to running app are also lost-- unless the admin is using exactly the same database for his local app (that he deployed to cloud)
(3) Any changes to running apps first have to be made to the local app on admin's PC and then pushed to cloud.
I am working on a Cloud PaaS that addresses all these concerns-- viz updates can be made to running apps, code changes made to running app are also updated in code repository accessible by user...The Proof of concept is ready, hopefully it will be as good as I hope it should be :) -- currently the only thing that is actually there is the website (anyacloudpanel.com) -- design work is going on :)
If there is some rule that I should not mention my website( Anya Cloud Panel) -- then I am sorry -- pls feel free to edit and remove my website URL from my answer :)
Thanks,
Arvind.
Deploying WordPress to AWS Elastic Beanstalk does require some change to the normal WordPress deployment as mentioned here a few times. To answer your questions, here is a great tutorial explaining stateless applications and how to deploy to Elastic Beanstalk:
Deploying WordPress to Amazon Web Services AWS EC2 and RDS via ElasticBeanstalk
Be careful if you use a theme from themeforest for example. Some of them are incompatible with wordpress S3 plugin. Then you're screwed, you can not deploy your wordpress on the cloud.

Django cluster deployment

I have five nodes behind a load balancer and I'm trying to determine the optimal configuration for a Django based site.
Each node has access to Postgres, mod_wsgi, Apache, Lighttpd, memcached, pgpool2 (for database replication) and glusterfs(for media file replication) and is running Ubuntu 8.04LTS.
So far, the setup is four nodes running Apache/Lighttpd/memcached/pgpool2 all reading/writing to one master node that is running the "master" Postgresql. Each of the four web nodes is also running Postgres for replication from the master via pgpool.
So, my question is: How would you configure this setup and/or what would you change so that there is no single point of failure, if possible?
This sounds like a good setup, although its hard to know exactly what your setup looks like. In terms of memory etc. and what traffic you expect to handle.
You might want to consider using Django's multidb support and have a read only postgres instance (use DB routing to direct reads to the read only for certain apps). This can offer up some quite nice speed improvements - and at the moment you could have a potential bottleneck at the single postgres instance depending how heavy your database work is.
As #ashwoods suggested, it might be working looking into gunicorn and nginx. I guess at the moment you use Apache only to run mod_wsgi? And lighttpd for the static files? With nginx, you can use it with a number of wsgi servers and its great for static files too.
The setup looks pretty good to me. I would consider using gunicorn/uwsgi + nginx. I would also benchmark using pbbouncer, although pgpool2 offers more out of the box.