Why is it recommended to add sqlite database to gitignore? - django

I understand other files being ignored. But why would I want to ignore the SQLite database file if that holds data needed to run the website? How can the website function without a database?

You probably only want to write to one instance of the file. This means it either lives in production or in your sandbox. If you change data in production, it's now newer than what you are tracking in git, and it will presumable be overwritten on the next deploy causing data loss.
Couple of minor issues:
git doesn't perform well when you store large binary files in it.
git can track binary files (like images) but you don't get as much value like being able to diff your .sqlite file before/after a change.

Because you should want to use different databases on testing environment and production environment

Related

Why we need to setup AWS and POSTgres db when we deploy our app using Heroku?

I'm building a web api by watching the youtube video below and until the AWS S3 bucket setup I understand everything fine. But he first deploy everything locally then after making sure everything works he is transferring all static files to AWS and for DB he switches from SQLdb3 to POSgres.
django portfolio
I still don't understand this part why we need to put our static files to AWS and create POSTgresql database even there is an SQLdb3 default database from django. I'm thinking that if I'm the only admin and just connecting my GitHub from Heroku should be enough and anytime I change something in the api just need to push those changes to github master and that should be it.
Why we need to use AWS to setup static file location and setup a rds (relational data base) and do the things from the beginning. Still not getting it!
Can anybody help to explain this ?
Thanks
Databases
There are several reasons a video guide would encourage you to switch from SQLite to a database server such as MySQL or PostgreSQL:
SQLite is great but doesn't scale well if you're expecting a lot of traffic
SQLite doesn't work if you want to distribute your app accross multiple servers. Going back to Heroky, if you serve your app with multiple Dynos, you'll have a problem because each Dyno will use a distinct SQLite database. If you edit something through the admin, it will happen on one of this databases, at random, leading to inconsistencies
Some Django features aren't available on SQLite
SQLite is the default database in Django because it works out of the box, and is extremely fast and easy to use in local/development environments for prototyping.
However, it is usually not suited for production websites. Additionally, while it can be tempting to store your sqlite.db file along with your code, for instance in a git repository, it is considered a bad practice because your database can contain sensitive data (such as passwords, usernames, emails, etc.). Hence, a strict separation between your code and data is a good practice.
Another way to put it is that your code and your data have different lifecycles. You want to be able to edit data in your database without redeploying your code, and update your code without touching your database.
Even if you can remove public access to some files through GitHub, this is not a good practice because when you work in a team with multiple developpers, developpers may have access to the code but not the production data, because it's usually sensitive. If you work with 5 people and each one of them has a copy of your database, it means the risk to lose it or have it stolen is 5x higher ;)
Static files
When you work locally, Django's built-in runserver command handles the serving of static assets such as CSS, Javascript and images for you.
However, this server is not designed for production use either. It works great in development, but will start to fail very fast on a production website, that should handle way more requests than your local version.
Because of that, you need to host these static files somewhere else, and AWS is one place where you can do that. AWS will serve those files for you, in a very efficient way. There are other options available, for instance configuring a reverse proxy with Nginx to serve the files for you, if you're using a dedicated server.
As far as I can tell, the progression you describe from the video is bringing you from a local, development enviromnent to a more efficient and scalable production setup. That is to be expected, because it's less daunting to start with something really simple (SQLite, Django's built-in runserver), and move on to more complex and abstract topics and tools later on.

I am using Flyway database migration tool. Can I archive artifacts for what was migrated or they have to stay under current folder?

I am using flyway database migration tool. Let's assume I have 100 sql scripts under a folder and I migrated them by applying on a server. Later I added 50 more sql scripts. Can I be able to move these 100 old sql scripts away and archive them somewhere (artifactory or remote share). That way I can only have 50 sql scripts which are new and needed for current migration.
Is this possible? Or all sql scripts has to be present under the directory?
It would be useful if you could edit your question to state the reason why you want to only include the 50 scripts for the current migration, as this would help me recommend an approach.
There are two ways that I know of that you could use.
Create subfolders for each release, rather than have one single folder with an unmanageable list of scripts. This approach means that all migrations remain in the project, which has advantages such as being able to rebuild a test database from scratch (using flyway clean migrate) and also being able to reuse the project later on to repeat earlier deployments. If you do it this way, you will need to reference each subfolder using the flyway.locations parameter.
Remove the sql scripts that have already been run, as you have suggested. If you do this, you will need to run flyway baseline on any targets. This command will ensure that the targets don't expect migrations below a specific version, so the missing migration files won't confuse flyway.
My personal preference is the first option. If you can't use this for whatever reason, we'd be interested to understand why.

Working on the same git with two different pc's. Two different postgresql settings in the settings.py file

I'm very new to databases and I'm trying to find out what the best practise for what I'm trying to achieve.
I have the one repository which is a Django backend with a postgresql database attached. I'm working with this on my main pc but recently I've had to work on my laptop. My laptop has another postgresql database running on 5432, so I've had to change some of that info to be on port 54324. These changes I don't want pushed to the repository, but I would still like to track the settings.py file in the repository. So far I've just created a branch for each pc to maintain the separate settings, but I'm sure this is not a great way to do it. I've heard about setting up environment files, but I'm unsure about if this is the 'right way' to do it either.
I'm a little confused with the best way I can do this, hopefully I'm making sense. Any help would be appreciated greatly.
Thanks,
Darren
This is normally solved with a properties file that is ignored. What you keep is a sample file (that has a different name) and that you do track and change accordingly on git. Your python scripts read the properties file and everybody should be happy.
Besides eftshift0's answer, consider having a committed config.defaults.py file that set default configuration values that may be overridden by a per-site config.local.py file. If the default configuration works for you, you don't need to create the per-site config. If not, create the per-site config. Never commit (and do .gitignore) the per-site config.
The names of the configuration files might be located outside the repository proper, but the overall idea still applies. The distributed (and committed) configuration file is a sample and/or default and actual site settings are kept in some other file that is never committed.
If you already have a single config.py or settings.py, you can establish this configuration pattern by adding site.py (use whatever name you want for this per-site setting file) as an ignored file. Read the new file, if it exists, such that the site settings override the default settings from the existing tracked file, and you're good to go.

Should Django migrations live in source control?

As the title says... I'm not sure if Django migrations should live in source control.
For:
If they get accidentally deleted from my local machine, it's going to cause me issues next time I want to run a migration... right? So it would be useful for me to have them.
Against:
Devs setting up the project for the first time shouldn't need to run them, they can just work straight from the models file.
They seem like machine-specific cruft.
Could they potentially reveal things I don't want about the database?
Yes, absolutely!!
From the docs:
The migration files for each app live in a “migrations” directory inside of that app, and are designed to be committed to, and distributed as part of, its codebase. You should be making them once on your development machine and then running the same migrations on your colleagues’ machines, your staging machines, and eventually your production machines.
One big point is that migrations should always be tested before you deploy them in production. You should never create migrations on production, only apply them.
You also want to synchronise the state of the models in source control with the state of the database. If someone pulls your branch, has to find a bug, and goes back in the source control's history, he'd need the migration files to change the state of the database to match that point in time. If he has to create his own migration files, they won't include the intermediate state, and he runs into a problem where his models are out-of-sync with the database.

Mercurial: keep 2 branches in sync but with certain persistent differences?

I'm a web developer working on my own using django, and I'm trying to get my head round how best to deploy sites using mercurial. What I'd like to have is to be able to keep one repository that I can use for both production and development work. There will always be some differences between production/development (e.g. they might use different databases, development will always have debug turned on) but by and large they will be in sync. I'd also like to be able to make changes directly on the production server (tidying up html or css, simple bugfixes etc.).
The workflow that I intend to use for doing this is as follows:
Create 2 branches, prod and dev (all settings initially set to production settings)
Change settings.py and a few other things in the dev branch. So now I've got 2 heads, and from now on the repository will always have 2 heads.
(On dev machine) Make changes to dev, then use 'hg transplant' to copy relevant changesets to production.
push to master repository
(On production server) Pull from master repo, update to prod head
Note: you can also make changes straight to prod so long as you transplant the changes into dev.
This workflow has the drawback that whenever you make a change, not only do you have to commit it to whichever branch you make the change on, you also have to transplant it to the other branch. Is there a more sensible way of doing what I want here, perhaps using patches? Or failing that, is there a way of automating the commit process to automatically transplant the changeset to the other branch, and would this be a good idea?
I'd probably use Mercurial Queues for something like this. Keep the main repository as the development version, and have a for-production patch that makes any necessary changes for production.
Here are two possible solutions one using mercurial and one not using mercurial:
Use the hostname to switch between prod and devel. We have a single check at the top of our settings file that looks at the SERVER_NAME environment variable. If it's www.production.com it's the prod DB and otherwise it picks a specified or default dev/test/stage DB.
Using Mercurial, just have a clone that's dev and a clone that's prod, make all changes in dev, and at deploy time pull from dev to prod. After pulling you'll have 2 heads in prod diverging from a single common ancestor (the last deploy). One head will have a single changeset containing only the differences between dev and prod deployments, and the other will have all the new work. Merge them in the prod clone, selecting the prod changes on conflict of course, and you've got a deployable setup, and are ready to do more work on 'dev'. No need to branch, transplant, or use queues. So long as you never pull that changeset with the prod settings into 'dev' it will always need a merge after pulling from dev, and if it's just a few lines there's not much to do.
I've solved this with local settings.
Append to settings.py:
try:
from local_settings import *
except ImportError:
pass
touch local_settings.py
Add ^local_settings.py$ to your .hgignore
Each deploy I do has it's own local settings (typically different DB stuff and different origin email addresses).
PS: Only read the "minified versions of javascript portion" later. For this, I would suggest a post-update hook and a config setting (like JS_EXTENSION).
Example (from the top of my head! not tested, adapt as necessary):
Put JS_EXTENSION = '.raw.js' in your settings.py file;
Put JS_EXTENSION = '.mini.js' in your local_settings.py file on the production server;
Change JS inclusion from:
<script type="text/javascript" src="blabla.js"></script>
To:
<script type="text/javascript" src="blabla{{JS_EXTENSION}}"></script>
Make a post-update hook that looks for *.raw.js and generates .mini.js (minified versions of raw);
Add .mini.js$ to your .hgignore
Perhaps try something like this: (I was just thinking about this issue, in my case it's a sqlite database)
Add settings.py to .hgignore, to keep it out of the repository.
Take your settings.py files from the two separate branches and move them into two separate files, settings-prod.py and settings-dev.py
Create a deploy script which copies the appropriate settings-X file to settings.py, so you can deploy either way.
If you have a couple of additional files, do the same thing for them. If you have a lot of files but they're all in the same directory by themselves, you could just create a pair of directories: production and development, and then either copy or symlink the appropriate one into a deploy directory.
If you did something like this, you could dispense with the need for branching your repository.
I actually do this using named branches and straight merging instead of transplanting (which is more reliable, IMO). This usually works, although sometimes (when you've edited the different files on the other branch), you'll need to pay attention not to remove the differences again when you're merging.
So it works great if you're not changing the different files much.