What is the best way to version control the db schema & metadata in Hasura Cloud? - database-migration

We've ran into an issue where we have the db backed up, but migrations got out of sorts and as a result there are a lot of GraphQL queries in our frontend code that don't match up to the db relationships at all.
I'm new to the project, but it looks like people were just making changes in the Hasura console instead of via the CLI and committing migrations.
I'm going through and recreating relationships manually so they match up with the GraphQL queries in the frontend, but moving forward I'd like to ensure this doesn't happen again.
We'd also prefer to move everything from our Docker image on Heroku to Hasura Cloud if possible.
My question is:
Is there a standardized pattern for ensuring the db data, the db
schema, and the Hasura [preferably Hasura Cloud] metadata are all
version controlled?
Moreover, is there a way to enforce that pattern so other devs can't simply tweak things in Hasura Console and everything gets out of sync again. 😬
Thank you so much in advance if you can help. 🙇‍♂️

https://hasura.io/blog/moving-from-local-development-staging-production-with-hasura/ is a good place to start.
I’d strongly encourage following that. Running Hasura locally through Docker is really easy to set up and hasura console will give you access to a localhost console that syncs your changes with metadata/migration files in your local repo. From there, just commit, review, merge, and take advantage of Hasura Cloud’s GitHub deploy if you can. If not, hasura deploy and a few environment variables are really all you need to roll out changes.
As far as preventing devs from tweaking things in the console, if you’re talking about some deployed shared environment then honestly I think access to the console should be limited.

Related

Deployment best-practices in Apex

I have two instances of Apex, Development and Production.
My current process involves making updates on Development, exporting the app and then importing and overwriting the app on Production. But if I make any schema changes, I am also having to go into the live production database and make those same changes as well.
Doing these changes on a live database is surely bad practice. I've used with frameworks in the past that have elegant database migration tools that allow for rollback, etc. And provide a lot of peace of mind.
Does anything like this exist for Apex? Or is this the only way of deploying between separate instances?
Thanks!

Why we need to setup AWS and POSTgres db when we deploy our app using Heroku?

I'm building a web api by watching the youtube video below and until the AWS S3 bucket setup I understand everything fine. But he first deploy everything locally then after making sure everything works he is transferring all static files to AWS and for DB he switches from SQLdb3 to POSgres.
django portfolio
I still don't understand this part why we need to put our static files to AWS and create POSTgresql database even there is an SQLdb3 default database from django. I'm thinking that if I'm the only admin and just connecting my GitHub from Heroku should be enough and anytime I change something in the api just need to push those changes to github master and that should be it.
Why we need to use AWS to setup static file location and setup a rds (relational data base) and do the things from the beginning. Still not getting it!
Can anybody help to explain this ?
Thanks
Databases
There are several reasons a video guide would encourage you to switch from SQLite to a database server such as MySQL or PostgreSQL:
SQLite is great but doesn't scale well if you're expecting a lot of traffic
SQLite doesn't work if you want to distribute your app accross multiple servers. Going back to Heroky, if you serve your app with multiple Dynos, you'll have a problem because each Dyno will use a distinct SQLite database. If you edit something through the admin, it will happen on one of this databases, at random, leading to inconsistencies
Some Django features aren't available on SQLite
SQLite is the default database in Django because it works out of the box, and is extremely fast and easy to use in local/development environments for prototyping.
However, it is usually not suited for production websites. Additionally, while it can be tempting to store your sqlite.db file along with your code, for instance in a git repository, it is considered a bad practice because your database can contain sensitive data (such as passwords, usernames, emails, etc.). Hence, a strict separation between your code and data is a good practice.
Another way to put it is that your code and your data have different lifecycles. You want to be able to edit data in your database without redeploying your code, and update your code without touching your database.
Even if you can remove public access to some files through GitHub, this is not a good practice because when you work in a team with multiple developpers, developpers may have access to the code but not the production data, because it's usually sensitive. If you work with 5 people and each one of them has a copy of your database, it means the risk to lose it or have it stolen is 5x higher ;)
Static files
When you work locally, Django's built-in runserver command handles the serving of static assets such as CSS, Javascript and images for you.
However, this server is not designed for production use either. It works great in development, but will start to fail very fast on a production website, that should handle way more requests than your local version.
Because of that, you need to host these static files somewhere else, and AWS is one place where you can do that. AWS will serve those files for you, in a very efficient way. There are other options available, for instance configuring a reverse proxy with Nginx to serve the files for you, if you're using a dedicated server.
As far as I can tell, the progression you describe from the video is bringing you from a local, development enviromnent to a more efficient and scalable production setup. That is to be expected, because it's less daunting to start with something really simple (SQLite, Django's built-in runserver), and move on to more complex and abstract topics and tools later on.

Rolling back migrations in Clojure after Docker Container Rollback

We have a Clojure service that runs in a Docker Container running in Amazon ECS. When the container is deployed, the Clojure services connects to the database and always runs migrations on startup.
The problem is that if we need to rollback the code deploy, the deployed container has the old code in it, and does not have access to the rollback migrations that the latest container has.
This is a problem that doesn't happen often, but when it does, how do we perform the DB Rollback?
The best we can think of right now is to do it manually.
Anyone have experience doing this programatically?
It seems like you should consider separating your migrations from your actual deployable. Everyone will have their own preference for managing migrations but you lose flexibility when you package your migrations into your application. A dedicated migration tool can act more intelligently when on its own. For example, some database migrations are impossible to rollback without some sort of snapshot system, ex any migration that removes data. Additionally, its bad practice for your application to have the permissions necessary to perform migrations. You also cannot easily audit which user performed the migration.

On Heroku, is there danger in a Django syncdb / South migrate after the instance has already restarted with changed model code?

On Heroku, as soon as you push new code, the web-serving instances restart... even if the underlying database schema additions/changes (via syncdb or south migrate) haven't yet been applied.
In many cases, this might just cause harmless errors undtil the syncdb/migrate is run soon afterward. But I'm concerned that in some cases, new code might half-work making unexpected changes in the pre-migration database.
What's the right way to be safe against this risk?
One technique might be to add the syncdb/migrate to the Procfile so it's run before web restart. But, in the case of multiple instances, or maybe even a case where the one old-code-instance is left running until the moment the one new-code-instance is known-up, there's still a variant of the issue where code is talking to a DB with a mismatched schema.
Is there a 'hold all web instances' feature (or common best practice) for letting the migrate complete without web traffic?
Or am I being overly concerned about a risk that is negligible in practice?
The safest way to handle migrations of this nature, Heroku or no, is to strictly adopt a compatibility approach with your schema and code:
Every additive or transformative schema change must be backwards-compatible;
Every destructive schema change must be performed after the code that depends on it has been removed;
Every code change must either be:
durable against the possibility that associated schema changes have not yet been made (for instance, removing a model or a field on a model) or
made only after the associated schema change has been performed (adding a model or a field on a model)
If you need to make a significant transformation of a model, this approach might require the following steps:
Create a new database table to hold your new model structure, and deploy that migration
Create a new model with the new structure, and code to copy changes from the old model to the new model when the old model changes, and deploy that code
Execute a migration or code action to copy all old model data to the new model
Update your codebase to use the new model rather than the old model, deleting the old model, and deploy that code
Execute a migration to delete the old model structure from the database
With some thought and planning, it can be used for more drastic changes as well:
Deploy code that completely removes dependence on a section of the database, presumably replacing those sections of the site with maintenance pages
Deploy a migration that makes drastic changes that would not for whatever reason work with the above dual-model workflow
Deploy code that brings the affected sections back with the new model structure supported
This can be hard to organize and requires strict discipline and firm understanding of your code's interaction with your database, but in practice, it does allow for most changes to be made with no more downtime than the server restart itself imposes.
Looks like fast-database changeovers are the way to go, but it requires a dedicated database.
http://devcenter.heroku.com/articles/fast-database-changeovers
Alternatively, here's a tutorial for copying the data from one database (e.g., production) to another database (e.g., staging), doing the schema/data migration (e.g., using django/south), then switching the app to use the newly-updated database instance.
http://devcenter.heroku.com/articles/migrating-data-between-plans
Seems reasonable, but potentially slow if there's a large amount of data.
The recommended method is this:
Add database changes for your new features to your existing code
Make the existing code compatible with the new schema
Deploy
Add the new features to your codebase
Deploy
This means that your database changes are already in place when the code starts to require them.
However....
There's a couple of issues with this. First that I know of no development shop that is organised enough to be able to handle this, as features just get built ad-hoc, and secondly that you're not really saving anything.
Generally speaking, unless your making big changes to a massive database your changes won't take long to apply and are usually over in a couple of seconds which a developer can work around quite happily issuing restarts etc when needed. The risk being that a user might get an error page. If the changes are larger, you have some alternatives. One is using maintenance mode to turn the site off for a few seconds.
To be honest, there is no clear cut way for how to handle this nicely as by definition your code needs to be in place for your database changes to start. The best way I've found to approach the problem is to look at each change individually and work out the smoothest path for each on a case by case basis.
Rehearsing deployments on a staging environment will mitigate the risk of a deploy going bad, and give you an idea of the impact.
Heroku recently released "buildpacks" which are the scripts they use to set up an environment for your application, from managing dependencies to restarting the instances. Essentially it's a more comprehensive Procfile which you can customize.
You can fork the Python buildpack and modify the script to run in the sequence you want. Append the command you run to syncdb to the end of bin/steps/django. Commit and put this repo on Github.
Unfortunately as of now it's not possible to modify the buildpack of an existing Heroku app, so you'll have to delete it and recreate one that points to your buildpack repo:
heroku create --stack cedar --buildpack git#github.com:...
This is the best solution because it
Doesn't cost anything at all
Doesn't require you to adapt your code to Heroku
Only syncs the db once per deployment
Hope this helps.

How to ensure database changes can be easily moved over DVCS using django

Overview
I'm building a website in django. I need to allow people to begin to add flatpages, and set some settings in the admin. These changes should be definitive, since that information comes from the client. However, I'm also developing the backend, and as such will am creating and migrating tables. I push these changes to the hub.
Tools
django
git
south
postgres
Problem
How can I ensure that I get the database changes from the online site down to me on my lappy, and also how can I push my database changes up to the live site, so that we have a minimum of co-ordination needed? I am familiar with git hooks, so that option is in play.
Addendum:
I guess I know which tables can be modified via the admin. There should not be much overlap really. As I consider further, the danger really is me pushing data that would overwrite something they have done.
Thanks.
For getting your schema changes up to the server, just use South carefully. If you modify any table they might have data in, make sure you write both a schema migration and as necessary a data migration to preserve the sense of their data.
For getting their updated data back down to you (which doesn't seem critical, but might be nice to work with up-to-date test data as you're developing), I generally just use Django fixtures and the dumpdata and loaddata commands. It's easy enough to dump a fixture and commit it to your repo, then a loaddata on your end.
You could try using git hooks to automate some of this, but if you want automation I do recommend trying something like Fabric instead. Much of this stuff doesn't need to be run every single time you push/pull (in particular, I usually wouldn't want to dump a new data fixture that frequently).
You should probably take a look at South:
http://south.aeracode.org/
It seems to me that you could probably create a git hook that triggers off South if you are doing some sort of continuous integration system.
Otherwise, every time you do a push you will have to manually execute the migration steps yourself. Don't forget to put up the "site is under maintenance" message. ;)
I recommend that you use mk-table-sync to pull changes from live server to your laptop.
mk-table-sync takes a lot of parameters so you can automate this process by using fabric. You would basically create a fabric function that executes mk-table-sync on each tablet that you want to pull from the server.
This means that you can not make dabatase changes yourself, because they will be overwritten by the pull.
The only changes that you would be making to the live database are using South. You would push the code to the server and then run migrate to update the database schema.