How to handle DB migration using AWS deployment tools

How to handle DB migration using AWS deployment tools - amazon-web-services

Amazon Web Services offer a number of continuous deployment and management tools such as Elastic Beanstalk, OpsWorks, Cloud Formation and Code Deploy depending on your needs. The basic idea being to facilitate code deployment and upgrade with zero downtime. They also help manage best architectural practice using AWS resources.
For simplicity lets assuming a basic architecture where you have a 2 tear structure; a collection of application servers behind a load balancer and then a persistence layer using a multi-zone RDS DB.
The actual code upgrade across a fleet of instances (app servers) is easy to understand. For a very simplistic overview the AWS service upgrades each node in turn handing connections off so the instance in question is not being used.
However, I can't understand how DB upgrades are managed. Assume that we are going from version 1.0.0 to 2.0.0 of an application and that there is a requirement to change the DB structure. Normally you would use a script or a library like Flyway to perform the upgrade. However, if there is a fleet of servers to upgrade there is a point where both 1.0.0 and 2.0.0 applications exist across the fleet each requiring a different DB structure.
I need to understand how this is actually achieved (high level) to know what the best way/time of performing the DB migration is. I guess there are a couple of ways they could be achieving this but I am struggling to see how they can do it and allow both 1.0.0 and 2.0.0 to persist data without loss.
If they migrate the DB structure with the first app node upgrade and at the same time create a cached version of the 1.0.0. Users connected to the 1.0.0 app persist using the cached version of the DB and users connected to the 2.0.0 app persist to the new migrated DB. Once all the app nodes are migrated, the cached data is merged into the DB.
It seems unlikely they can do this as the merge would be pretty complex but I can't see another way. Any pointers/help would be appreciated.

This is a common problem to encounter once your application infrastructure gets into multiple application nodes. In the olden days, you could take your application offline for "maintenance windows" during which you could:
Replace application with a "System Maintenance, back soon" page.
Perform database migrations (schema and/or data)
Deploy new application code
Put application back online
In 2015, and really for several years this approach is not acceptable. Your users expect 24/7 operation, so there must be a better way. Of course there is, the answer is a series of patterns for Database Refactorings.
The basic concept to always keep in mind is to assume you have to maintain two concurrent versions of your application, and there can be no breaking changes between these two versions. This means that you have a current application (v1.0.0) currently in production and (v2.0.0) that is scheduled to be deployed. Both these versions must work on the same schema. Once v2.0.0 is fully deployed across all application servers, you can then develop v3.0.0 that allows you to complete any final database changes.

Related

Optimal way to sync NoSql (DynamoDB) with Elastisearch (not the AWS managed one) in local environment

I currently have aws dynamodb (port: 8000) and elasticsearch (port: 9200)+kibana(port: 5601) running locally.I pretty much spent almost half a day figuring out how to sync these services together. To provide context, I have a nextJS app running thats integrated with both clients however, when I upsert data into dynamo, Id like to get it synced with elasticsearch right away.
Note, for upper environments I plan on using aws lambdas integrated with dynamodb streams to update indexes in elasticsearch instances (I am not a fan of the aws elasticsearch managed service).
Here is what I have tried thus far for local env syncing:
Logstash w/dynamodb plugin (https://github.com/awslabs/logstash-input-dynamodb) - the repo hasnt been updated in 4 years and issues support newer version of ES. Been a headache all morning trying to get the thing running due to this and other issues --> https://github.com/awslabs/logstash-input-dynamodb/issues/10
result: its a lost cause as far as I know... ive even tried their docker images but not workin..
node-scheduler: oh jeez... given Im using nextJS Id have to create a custom server just to get this extra piece synced... not to mention it removes a lot of the important features of nextJS -> https://nextjs.org/docs/advanced-features/custom-server
host the development dynamodb in aws (not local) + provision ES on ec2 and point local app there. I think this is overkill imo as now pretty much everything is hosted except my app, which could be ok. Let me know your thoughts as I already have dev environment using a separate cognito pool to authenticate.
Just chain ES calls to any updates made on dynamodb. Can someone tell me why we cant do this locally and in general? I get that data can get out of sync but maybe we can do hourly, or even twice a day (offpeak) cron jobs to bulk update the dynamo records to ES.
Would really appreciate your thoughts. Thanks!

On Prem Application migration to the AWS

We are migrating some of our J2EE based application from on-prem to the AWS cloud. I am trying to find some good document on what steps to be considered for the App migration. Since we already have an AWS account, and some of the applications have been migrated earlier, I don't have to worry about those aspects.. However I am thinking more towards
- Which App-server to use?
- Do i need to migrate DB as well..or just the App?
- Any licensing requirements for app.. we use mostly Open source.. So that should be fine..
- Operational monitoring after migrating to cloud..
Came across some of these articles.
https://serverguy.com/cloud/aws-migration/
Migration Scenario: Migrating Web Applications to the AWS Cloud : https://d36cz9buwru1tt.cloudfront.net/CloudMigration-scenario-wep-app.pdf
I would like to know If you have worked on this kind of work.. and If you point me to some helpful document/links.. or your pwn experience?

So theres 2 good resources I'd recommend for migration:
AWS Whitepaper for migration
AWS Well-Architected Framework.
The key is planning, but not being afraid to experiment. This is cloud so don't be afraid of setting an instance size in stone, you can easily change it.

Meteor Framework on AWS

I have an application developed in Meteor Framework.
We are planning to move it to AWS withmulti AZ deployment
need Master Slave configuration for the Mongo DB
My question is how to achieve this, i believe mongo db comes bundled in with the Framework itself,
never worked on it so any help will be appriciated.
Thanks

Welcome to Stack Overflow.
Mongo is bundled into the development environment, but not the server.
It is normal to host the database either on a different server of your own, or using a database service (there are many around, such as compose.io, Mongolab etc) So Mongo can be set up for load balancing and scaling independently of the app itself.

Deploying WordPress on Elastic Beanstalk?

Suppose I create a site in Wordpress, which is running on Elastic Beanstalk. Now, on the running app I will create posts /pages, upload images, etc. That is, some data, videos, files and records in a database will be added to the running application.
3 questions:
If WordPress is running on Elastic Beanstalk with multiple Amazon EC2 instances actually running my WordPress install, then will those files propagate automatically to all running instances? And will this also happen, if a new EC2 instance is fired up - for example, to handle increased load?
From what I see in AWS console, I can deploy different versions of an app-- but as per scenario above, if I deploy a new version, wont I lose all the files uploaded directly into running app (i.e. files and database records)? How do I keep those and at the same time deploy a new version of my app?
The WordPress team keeps issuing upgrades. Can I directly upgrade my running WordPress install, through the web interface? Or do I have to first upgrade my local version of WordPress, and then upload the new version of the app to Beanstalk? If I have to upgrade my local version and then upload, then again I am back to point 1, i.e. changes made by users directly to the older version of running app. How do I preserve those changes?

I've been working on this as well, and have learned a couple of things that are relevant here -- your question about uploads in particular has been on my mind:
(1) The best way to handle uploads, it seems to me, is to either go the NFS/NAS route like you suggest, but one better than that is to use an Amazon S3 plugin for WordPress, so that any uploads automatically copy up to S3 and the URLs in your WordPress media library reflect the FQDN of your bucket and not your specific site. That way you could have one or ten WP nodes in your Beanstalk and media/images are independent of any one of those servers.
(2) You should absolutely be using RDS here. Few things are easier to work with and as stress-free as a Multi-AZ, reserved MySQL RDS instance. Either that or your own EC2 running MySQL that is independent of the Beanstalk, but why run that when RDS is so much easier?
(3) Yes you definitely have to commit changes to your Git repository or local file first (new plugins, changes to themes, WP upgrades) and then upload/install as a revision to the Beanstalk code. Otherwise, all the changes you make via the web interface to one node will never be in the new load for a new node -- in fact you'll have an upgraded database but an older set of code in the Beanstalk application, so it's likely to create errors of some kind or another.
I took an AWS architecture course, and their advice for EC2 and the Beanstalk is to start to think about server instances as very disposable -- so you should try to think about easy ways for your boxes to provision themselves in the bootstrapping process and to take over work for one another without any precious resources on just one box. So losing an instance should never be a big deal. (This is definitely not how we thought in the world of physical servers, where we got everything tweaked 'just so'.)
Good luck!

Well, I'm no expert, but since no one has answered, I'll give it my best shot.
You are absolutely right--kind of. While each EC2 instance does have some local storage, it is destroyed and reset with each new instance. Because of this, Amazon has things like Elastic Block Storage and S3 for persistent files. I don't know how one would configure WP to utilize this, but that will likely be the solution.
I think this problem is solved by my answer to #1. As for the database, all of your EC2 instances should be pulling from the same RDS location. Again, while you could have MySQL running on each EC2 instance, in the interest of persistence, having a separate database makes more sense.
You, again, have most everything right. Local development should always precede live deployment. Upgrading locally then pushing to the live servers will make sure all of your instances remain identical.
Truth-be-told, I am still in the process of learning all of this too, as I said, I'm not an expert. Hopefully someone else will come along and give a more informed answer. However, the key conceptual hurdle here is the idea of elastic scalability--and the salient point of this idea is the separation of elements between what is elastic/scalable/disposable and what is persistent.
Hopefully that helps.

I have deployed a small Wordpress site on EB, S3 and RDS. S3 holds all static data, such as media uploads. This works through a plugin. RDS holds the database. EB holds the latest deployed application. The application is deployed from my dev environment, with a build script. This way, I just have to press one button and I redeploy.
I wrote an article about it here: http://www.cortexcode.com/wordpress-to-aws-code-example/
While it was at first annoying to work with, the speed of AWS is nice and now it's easier than ever. It used to be that I had to upload a bunch of files over FTP, this is way more efficient. :-)

As an addition to all the great answers already:
1) I can highly recommend EFS but also S3 for media files, so they are served from high availability regions in combination with cloudfront. For Wordpress there is one plugin that really speeds up this ( not affiliated to them, just really like the plugin ). There is also an assets plugin, if you'd like to serve JS, CSS files from S3. For the EFS solution, take a look at the AWSlabs docs on git, and specifically this file on how they mount the uploads file.
In general, EBS is really great for Wordpress, but you'll need to think in a different mindset as compared to other hosting solutions ( shared hosting, managed hosting ).

OK I researched a lot on this particular issue, and this is what I learned--
(1) If a wordpress user uploads some files, then his files will be uploaded only to the virtual machine that is actually serving his request at that time. Eg if currently the wordpress site is cloud-deployed and is using 5 virtual machines, now when user makes request he is directed to one virtual machine-- the one with the lowest load at that point... His uploads are stored only on that server. Current Platform-as-a-service solutions (like Amazon Elastic Beanstalk and App Fog) do not have the ability to propagate the changes to all the running instances. Either that (propagate changes to all servers) or use a common storage by all running instances-- these are the only 2 solutions to this problem. (Eg of common storage would be all 5 running virtual machines using Network-Attached-Storage (NAS)... )
(2) With ref to platforms available currently like Amazon Elastic Beanstalk and App Fog, for example, even if user made changes directly to running app- these platforms rely on the local version of code (which the admin deployed initially to cloud)- and there is no way to update the local version of code (on admin's PC) with the changes made by a user to running app-- hence these changes viz, files are lost-- Similarly, changes in database by user to running app are also lost-- unless the admin is using exactly the same database for his local app (that he deployed to cloud)
(3) Any changes to running apps first have to be made to the local app on admin's PC and then pushed to cloud.
I am working on a Cloud PaaS that addresses all these concerns-- viz updates can be made to running apps, code changes made to running app are also updated in code repository accessible by user...The Proof of concept is ready, hopefully it will be as good as I hope it should be :) -- currently the only thing that is actually there is the website (anyacloudpanel.com) -- design work is going on :)
If there is some rule that I should not mention my website( Anya Cloud Panel) -- then I am sorry -- pls feel free to edit and remove my website URL from my answer :)
Thanks,
Arvind.

Deploying WordPress to AWS Elastic Beanstalk does require some change to the normal WordPress deployment as mentioned here a few times. To answer your questions, here is a great tutorial explaining stateless applications and how to deploy to Elastic Beanstalk:
Deploying WordPress to Amazon Web Services AWS EC2 and RDS via ElasticBeanstalk

Be careful if you use a theme from themeforest for example. Some of them are incompatible with wordpress S3 plugin. Then you're screwed, you can not deploy your wordpress on the cloud.

Options for maintaining MySQL databases for a django development team

What are some options to avoid the latency of pointing local django development servers to a remote MySQL database?
If developers use local MySQL databases to avoid the latency, what are some useful tools to sync schema updates of the remote db with the local db and avoid manually creating, downloading, and loading dumps?
Thanks!

One possibility is to configure the remote MySQL database to replicate to the developers local machine - assuming you have control of the remote database's configuration.
See the MySQL docs for replication notes. Using MySQL replication the remote node would be the Master and the developer machines would be Slaves. The main advantage of this approach is your developer machines would always remain synchronized to the Master database. One possible disadvantage (depending on the number of developer machines you are slaving) is a degradation in the remote database's performance due to extra load introduced by replication.

It sounds like you want to do schema migrations. Basically it's a way to log schema changes so that you can update and even roll back along with your source changes (if you change a model you also check in a new migration that has up and down commands). While this will likely become an official feature at some point, there are several third-party solutions to choose from. It's really a personal preference, here are some popular ones:
South
Django Evolution
dmigrations

I use a combination of South for schema migrations, and storing JSON fixtures (or SQL dumps) of useful test data in the VCS repo for the project. Works pretty seamlessly.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js