Manage sqlite database with git - django

I have this small project that specifies sqlite as the database choice.
For this particular project, the framework is Django, and the server is hosted by Heroku. In order for the database to work, it must be set up with migration commands and credentials whenever the project is deployed to continuous integration tools or development site.
The question is, that many of these environments do not actually use the my_project.sqlite3 file that comes with the source repository, which we version control with git. How do I incorporate changes to the deployed database? Is a script that set up the database suitable for this scenario? Meanwhile, it is worth notice that there are security credentials that should not appear in a script in unencrypted ways, which makes the situation tricky.

that many of these environments do not actually use the my_project.sqlite3 file that comes with the source repository
If your deployment platform does not support your chosen database, then your development environment should probably be moved to using one of the databases they do support. It is possible to run different databases in development and production, but just seems like the source of headaches.
I have found a number of articles that state that Heroku just doesn't support SQLite in production and instead recommends Postgres.
How do I incorporate changes to the deployed database? Is a script that set up the database suitable for this scenario?
I assume that you are just extracting data from one database to give to another, so yes,as long as that script is a one time batch operation each time the code is updated, then it should be fine. You will want something else if you are adding/manipulating data in production and then exporting it to your git.
Meanwhile, it is worth notice that there are security credentials that should not appear in a script in unencrypted ways
An environment variable should solve that. You set your host machine to have environment variables with your credentials and then just retrieve them within the script. You are looking to have something like this:
# Set environment vars
os.environ['USER'] = 'username'
os.environ['PASSWORD'] = 'password'
# Get environment vars
USER = os.getenv('USER')
PASSWORD = os.environ.get('PASSWORD')

Related

Why we need to setup AWS and POSTgres db when we deploy our app using Heroku?

I'm building a web api by watching the youtube video below and until the AWS S3 bucket setup I understand everything fine. But he first deploy everything locally then after making sure everything works he is transferring all static files to AWS and for DB he switches from SQLdb3 to POSgres.
django portfolio
I still don't understand this part why we need to put our static files to AWS and create POSTgresql database even there is an SQLdb3 default database from django. I'm thinking that if I'm the only admin and just connecting my GitHub from Heroku should be enough and anytime I change something in the api just need to push those changes to github master and that should be it.
Why we need to use AWS to setup static file location and setup a rds (relational data base) and do the things from the beginning. Still not getting it!
Can anybody help to explain this ?
Thanks
Databases
There are several reasons a video guide would encourage you to switch from SQLite to a database server such as MySQL or PostgreSQL:
SQLite is great but doesn't scale well if you're expecting a lot of traffic
SQLite doesn't work if you want to distribute your app accross multiple servers. Going back to Heroky, if you serve your app with multiple Dynos, you'll have a problem because each Dyno will use a distinct SQLite database. If you edit something through the admin, it will happen on one of this databases, at random, leading to inconsistencies
Some Django features aren't available on SQLite
SQLite is the default database in Django because it works out of the box, and is extremely fast and easy to use in local/development environments for prototyping.
However, it is usually not suited for production websites. Additionally, while it can be tempting to store your sqlite.db file along with your code, for instance in a git repository, it is considered a bad practice because your database can contain sensitive data (such as passwords, usernames, emails, etc.). Hence, a strict separation between your code and data is a good practice.
Another way to put it is that your code and your data have different lifecycles. You want to be able to edit data in your database without redeploying your code, and update your code without touching your database.
Even if you can remove public access to some files through GitHub, this is not a good practice because when you work in a team with multiple developpers, developpers may have access to the code but not the production data, because it's usually sensitive. If you work with 5 people and each one of them has a copy of your database, it means the risk to lose it or have it stolen is 5x higher ;)
Static files
When you work locally, Django's built-in runserver command handles the serving of static assets such as CSS, Javascript and images for you.
However, this server is not designed for production use either. It works great in development, but will start to fail very fast on a production website, that should handle way more requests than your local version.
Because of that, you need to host these static files somewhere else, and AWS is one place where you can do that. AWS will serve those files for you, in a very efficient way. There are other options available, for instance configuring a reverse proxy with Nginx to serve the files for you, if you're using a dedicated server.
As far as I can tell, the progression you describe from the video is bringing you from a local, development enviromnent to a more efficient and scalable production setup. That is to be expected, because it's less daunting to start with something really simple (SQLite, Django's built-in runserver), and move on to more complex and abstract topics and tools later on.

Deploying Django as standalone internal app?

I'm developing an tool using Django for internal use at my organization. It's used to search and tag documents (using Haystack and Solr), and will be employed on different projects. My team currently has a working prototype and we want to deploy it 'in the wild.'
Our security environment is strict. Project documents are located on subfolders on a network drive, and access to these folders is restricted based on users' Windows credentials (we also have an MS SQL server that uses the same credentials). A user can only access the projects they are involved in. Since we're an exclusively Microsoft shop, if we want to deploy our app on the company intranet, we'll need to use an IIS server to deal with these permissions. No one on the team has the requisite knowledge to work with IIS, Active Directory, and our IT department is already over-extended. In short, we're not web developers and we don't have immediate access to anybody experienced.
My hacky solution is to forgo IIS entirely and have each end user run a lightweight server locally (namely, CherryPy) while each retaining access to a common project-specific database (e.g. a SQLite DB living on the network drive or a DB on the MS SQL server). In order to use the tool, they would just launch an all-in-one batch script and point their browser to 127.0.0.1:8000. I recognize how ugly this is, but I feel like it leverages the security measures already in place (note that never expect more than 10 simultaneous users on a given project). Is this a terrible idea, and if so, what's a better solution?
I've dealt with a similar situation (primary development was geared toward a normal deployment situation, but some users have a requirement to use the application on a standalone workstation). Rather than deploy web and db servers on a standalone workstation, I just run the app with the Django internal development server and a SQLite DB. I didn't use CherryPy, but hopefully this is somewhat useful to you.
My current solution makes a nice executable for users not familiar with the command line (who also have trouble remembering the URL to put in their browser) but is also relatively easy development:
Use PyInstaller to package up the Django app into single executable. Once you figure this out, don't continue to do it by hand, add it to your continuous integration system (or at least write a script).
Modify the manage.py to:
Detect if the app is frozen by PyInstaller and there are no arguments (i.e.: user executed it by double clicking it) and if so, then run execute_from_command_line(..) with arguments to start the Django development server.
Right before running the execute_from_command_line(..), pop off a thread that does a time.sleep(2) (to let the development server come up fully) and then webbrowser.open_new("http://127.0.0.1:8000").
Modify the app's settings.py to detect if frozen and change things around such as the path to the DB server, enabling the development server, etc.
A couple additional notes.
If you go with SQLite, Windows file locking on network shares may not be adequate if you have concurrent writing to the DB; concurrent readers should be fine. Additionally, since you'll have different DB files for different projects you'll have to figure out a way for the user to indicate which file to use. Maybe prompt in app, or build the same app multiple times with different settings.py files. Variety of a ways to hit this nail...
If you go with MSSQL (or any client/server DB), the app will have to know the DB credentials (which means they could be extracted by a knowledgable user). This presents a security risk that may not be acceptable. Basically, don't try to have the only layer of security within the app that the user is executing. The DB credentials used by the app that a user is executing should only have the access that the user is allowed.

Pain of configuring various environments in development and production (Rails 4 application)

As per best practices, my development team does not store the application config file in a repo for security reasons (we use a config/application.yml file to store configs). However, when we actually develop and deploy, this causes some problems:
A developer needs to add a new external URL that is different depending on what environment the application is running in. Since there is no config file in the repo, he cannot update a single file that gets synced when another developer pulls the code. To make this happen, he updates his local config/application.yml file and then each other developer updates their local file, and then we have to add the new ENV variable to the server's config/application.yml. Has to be a better solution.
If we stored the config/application.yml file in the repo and shared it among everyone and the servers, this solves the problem of sharing/updating global configs, BUT it opens up the possibility that a developer may accidentally start their local application in production mode and touch live data or spam real users with test emails (has happened which is why it's a concern).
Is there a standard best practice for solving these types of problems? It seems I either sacrifice productivity for security but can't really have both.
I've been thinking about creating a config/development.yml file in the repo that all developers share, which stores all environments EXCEPT production. That way they can share config/ENV items for development and sync them up. But in production, I would have a config/production.yml file that ONLY lives on the servers.
If the application is started in anything except production environment, it loads the development.yml file. If it is started in production, it loads the production.yml file. But since the production.yml file does NOT live in the repo (only on the servers), there's no chance that a developer can accidentally touch live data or spam real users, etc...
Have any professional developers tried a scheme like this? I've done a lot of googling but really haven't found a satisfactory solution.
Check out the RailsConfig gem. This allows you do to exactly what you stated, but with the ease of a gem. This also allows you and your dev team to have local yaml files that override settings.
config/settings.yml
config/settings/#{environment}.yml
config/environments/#{environment}.yml
config/settings.local.yml
config/settings/#{environment}.local.yml
config/environments/#{environment}.local.yml
You would then just have config/settings/production.yml within your .gitignore so that it will not be checked into source control.

Should I have my Postgres directory right next to my project folder? If so, how?

I'm trying to develop a Django website with Heroku. Having no previous experience with databases (except the sqlite3 one from the tutorial), it seems to me a good idea to have the following file structure:
Projects
'-MySite
|-MySite
'-MyDB
I'm finding it hard to figure out how to do it, with psql commands preferring to put the databases in some obscure directory instead. Perhaps it's not such a good idea?
Eventually I want to be able to test and develop my site (it'll be just a blog for a while, I'm still learning) locally (ie. add a post, play with the CSS) and sync with the Heroku repository, but I also want to be able to add posts via the website itself occasionally.
The underlying data files (MyDb) has nothing to do with your project files and should not be under your project.
EDIT
added two ways to sync your local database with the database ON the Heroku server
1) export-import
This is the most simple way, do the following steps every now and then:
make an export on the Heroku server by using the pg_dump utility
download the dump file
import the dump into your local database by using the psql utility
2) replication
A more sophisticated way for keeping your local db in sync all the time is Replication. It is used in professional environments and it is probably an overkill for you at the moment. You can read more about it here: http://www.postgresql.org/docs/9.1/static/high-availability.html

How to continue developement of live Django Webapp?

I am building a Django powered web app that has a large database component. I am wondering, how would I go about continuing to develop the web app while users are using the live, production version? There are two parts to the problem, as I see it, as follows:
Making changes to templates, scripts, and other files
Making database schema changes
Now, the first problem is easy to manage with a SVN system. Heck, I could just have a "dev" directory which have all my in-development files and, once ready, just copy them into the "production" directory.
However, the second problem is more confusing to me. How do I test/develop new database changes without affecting the main/live database? I have been using South to do schema migrations during the initial creation stages of the web app, but surely I wouldn't want to make changes to the database while it is being used. Especially if I make changes that I don't want to keep.
Any thoughts/ideas?
You need another server on which to do your development. Typically, this is a personal machine, like your laptop. Often, you also have a copy of your production environment on a server, known as the staging server.
Your workflow would be like this:
Work on your code on your development machine, make all the changes you want, it's just you using it.
When the code is ready for production, you push it to the staging server to see that it really works properly in a server environment.
When you're sure it's ready for production, push it to the production server.