Advice for AWS storage setup (php/mysql with autoscaling) - amazon-web-services

I have a php/mysql website that I want to deploy on AWS. Ultimately, I'm going to want auto-scaling (but don't need it right away).
I'm looking at an EBS based AMI. I see that by default the "Root Device Volume" is deleted when an instance "terminates". I realize I can also attach other EBS devices/drives to an instance (that will persist after termination) but I'm going to save most user content in S3 so I dont think that's necessary. I'm not sure how often I'll start/stop vs when i would ever want to terminate. So that's a bit confusing.
I'm mostly confused with where changes to the system get saved. Say I run a YUM install or update. Does that get saved in the "root device volume"? If i stop/start the instance, the changes should be there? What about if I setup cron jobs?
How about if I upload files? I understand to an extent that it depends where I put the files and if I attached a second EBS. Say I just put them in the root folder "/" (unadvised, but for simplicity sake). I guess that they are technically saved in the "root device volume"? If I start/stop the instance, they should still be there?
However, if I terminate an instance, then those changes/uploads are lost. But if I set the "root device volume" to not delete on termination, then I can launch a new instance with the changes there?
In terms of auto-scaling. Someone said to leave the "root device volume" to default delete so that when new instances are spun up/shut down, they don't leave behind zombie EBS volumes that are no longer needed (and would require manual clean-up)?
Would something like this work: ?
Setup S3 bucket (for shared image uploads)
Setup Amazon RDS / mysql
Setup DynamoDB (for sharing php sessions)
Launch EBS-backed AMI (leave as default to delete "root device volume" on
termination). Make system updates using yum/etc. Upload via sftp
PHP/HTML/JS/CSS files (ex: /var/www/html). Validate site can save
images to S3, share sessions via DynamoDB, access mysql via RDS.
Make/clone your own AMI image from your currently running/configured
one. Save it with a name that indicates site version/date/etc.
Setup auto-scaling to launch the image created in #5
I'm mostly concerned with how to save my configuration so that 1) changes are saved in-case i ever need to terminate an instance (before using autoscale) and 2) that auto-scaling will have access to the changes when I'm ready for it. I also don't want something like the same cron-job running on all auto-scaling instances.
I guess I'm confused with "does creating my own AMI image in #4" basically replace the "saving EBS root device volume" on termination? I can't wrap my head around the image part of things vs the storage part of things.
I get even more confused when I read about people talking about if you use "Amazon Linux" then the way they deploy updates every 6 months makes it difficult to use because you are forced to use new versions of software. How does that affect my custom AMI (with my uploaded code)? Can I just keep running yum updates on my custom AMI (for security patches) and ignore any changes to amazon's standard AMIs? When does the yum approach put me at risk for being out-of-date?
I know there are a host of things I'm not covering (dns/static IPs/scaling metrics/etc). That instead of uploading files then creating an AMI image, some people have their machine set to pull files from git on startup (i dont mind my more manual approach for now). Or that i could technically put the php/html/css/js on S3 too.
Sorry for all of the random questions. I know my question might not even be totally clear, but I'm just looking for confirmation/advice in-general. There are so many concepts to tie-together.
Thanks and sorry for the long post!

Yes, if you install packages, upload files, set up cron jobs, etc. and then stop the EBS-based instance, everything will still be there when you restart it.
Consequently, if you create an AMI from that instance and then use it for your autoscaling group, all the instances of the autoscaling group will run the cron jobs.
Your steps look good. As you are creating an AMI, your changes will be saved in that AMI. If the instance is terminated, it can be recreated via the AMI. The modifications made on that instance since the AMI creation will not be saved though. You need to create an AMI or take a snapshot of the EBS volume if you want a backup.
If you make a change and want to apply it to all the instances in the autoscaling group, you need to create a new AMI and apply it to your autoscaling group.
Concerning the cron jobs, I guess you have 2 options:
Have 1 instance that is not part of the autoscaling group running them (and disable the cron jobs before creating the AMI for the autoscaling group)
Do something smart so only one instance of the autoscaling group runs them. Here is the first page I hit on Google: https://gist.github.com/kixorz/5209217 (not tested)
Yes, creating your own AMI image basically replaces the "saving EBS root device volume" on termination.
An EBS boot AMI is an EBS snapshot of the EBS root volume plus some
metadata like the architecture, kernel, AMI name, description, block
device mappings, and more.
(From: AWS Difference between a snapshot and AMI)
Yes, you can automatically run yum security updates. To be completely identical to the latest Amazon Linux AMI, you should run all yum updates (not only security). But I wouldn't run those automatically.
Let me know if I forgot to answer some of your questions or if some points are still unclear.

Related

Do I have to reinstall software packages after restoring from EBS Snapshot on AWS

I am having trouble fully understanding the differences between EBS Snapshots and AMI's in AWS. I understand how to create them and the terminology (Snapshots are backups of Volumes or the disk attached to an EC2 Instance and AMI is the backup of the full EC2 instance with snapshots at given time). But, I'm not sure what exactly is saved in the snapshot.
Assume I have an EC2 instance with LAMP stack installed on it and some data from the database. I understand the snapshot will save all my data and website files. When I create the new EC2 instance and attach the volume backed by my snapshot, do I then have to install Apache, MySQL, and PHP again? Or is that all saved in the snapshot? I am not worried about having to redo the security settings, instance type, etc for the new EC2 instance, but am worried all the configuration files are lost unless I have an AMI.
I understand how to create them and the terminology (Snapshots are backups of Volumes or the disk attached to an EC2 Instance and AMI is the backup of the full EC2 instance with snapshots at given time)
This is kinda correct, but not exactly. A better understanding of the concepts may help you in general, so here it goes.
A snapshot represents the state of an EBS volume at an exact point in time, taken atomically, that you can use to create other EBS volumes from. Those new EBS volumes, created from the snapshot, will start with the exact same content as the original EBS volume at the time the snapshot was taken. So, kind of a "backup of volumes").
One extremely important thing about Snapshots, though, that you seem to be missing based on the rest of your question, is that Snapshots are immutable. Once create, the contents of a snapshot cannot change, ever.
An AMI is an Image — it's used to create an instance from. It's a bunch of metadata, which includes the type of hypervisor required, accounts allowed to use it, and ultimately it contains a list of EBS Snapshots that should be used to create volumes, and where exactly to attach the volumes created from those snapshots.
So, while "creating an AMI" is indeed a strategy some people use to create a "backup of the full EC2 instance", it's not exactly the same thing.
When I create the new EC2 instance and attach the volume backed by my snapshot [...]
I think you're confusing some concepts here.
When a new EBS volume is created from a snapshot (either by you creating the EBS volume directly and selecting a snapshot, or by using an AMI, which implicitly means that EBS volumes are going to be created from the Snapshots as specified in the metadata that the AMI represents), there's no relationship between that snapshot and the new volume anymore. Any changes made to the volume are completely independent of the original snapshot. Remember: snapshots are immutable.
So, this notion of "the volume backed by my snapshot" may be quite unhelpful: there's no link between them.
Hoping this is clear, let's move on...
When I create the new EC2 instance [...], do I then have to install Apache, MySQL, and PHP again? Or is that all saved in the snapshot?
The software that will come pre-installed in the EC2 instance is defined by the contents of the snapshots used when the EBS volumes attached to the instance were created.
In other words, if you create an EC2 instance from a standard AMI (not one that you create) that doesn't come with the software pre-installed, then the software won't be installed.
If you create an AMI from your instance before you install the software you want, then the AMI (or, more precisely, the snapshots referenced by the AMI) won't have the software, so any instances created from that AMI won't come with the software.
Now, here's what you are probably looking for:
If you (1) create an instance, then (2) install the software, and only then (3) you create an AMI, in this order, then any new instances created from that AMI will come with the software pre-installed.
The easiest way to fully and truly understand this is to remember that (A) Snapshots are immutable, and (B) AMIs are immutable references to Snapshots.
All that said, while creating a custom AMI is indeed a completely valid and popular approach to the problem you are dealing with, there's another approach that's also completely valid and popular, but that is more flexible, and could be worth investigating.
Instead of having to create and manage AMIs (*), what you could do is use a user data script.
Essentially, what it does is it allows you to have a script, that will execute as root, on the first boot of a newly created instance. This script is often used to install software, update packages, download configuration files, etc.
This way, if you need to change versions of software, or change configuration files, etc, you don't need to go through all the complexity of managing AMIs. Instead, you simply update the script.
For more info on user data, check this out: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html
(*) "manage AMIs": remember that an AMI is an immutable set of metadata which includes reference to Snapshots, which are also immutable. So if you ever need to change some settings, or have new versions of the software be installed, you will need to create new AMIs. This could become something quite complicated and time-consuming. In fact, there's a ton of tools out there, some by AWS, some by other companies, that try to simplify the process of creating and managing AMIs. So just be aware that, while doable, valid, and popular, it's an approach that can get messy!

How to take a backup of EC2 instance in AWS and move to a low cost alternative?

We have an EC2 instance running in AWS EC2 instance. We have our ML algorithms and data that. We have also hosted a web-based interface also in that machine.
Now there are no new developments happening in that EC2 instance. We would like to terminate AWS subscription for a short period of time (for the purpose of cost-reduction and exploring new cloud services). Most importantly, we want to be in a position where we can purchase a new EC2 instance with a fresh AWS subscription, use the backup which we take now, and resume all operations (web-backend, SMS services for our app which is hosted in AWS, etc.).
What is the best way to do it? Is temporary termination of AWS subscription advisable?
There is no concept of an "AWS Subscription". AWS is charged on-demand, which means you only pay when you use resources.
If you temporarily do not want the Amazon EC2 instance, you could:
Stop the instance, which is like turning off the power. You will not be charged for the instance, but you will still pay for the disk storage attached to the instance. You can simply Start the instance again when you wish to use it. You will only be charged while the instance is running. OR
Create an image of the instance, then terminate the instance. This will create an Amazon Machine Image (AMI), which contains a copy of the disks. You can then launch a new Amazon EC2 instance from the AMI when you wish to use it again. This is a lower-cost option compared to simply stopping the instance, but it takes more effort to stop/start.
It is quite common for companies to stop Amazon EC2 instances at night or over the weekend to reduce costs while they are not needed.
EDIT: Just thought of a third option. Will test it and be back. Not worth it; it would involve creating an image from the EC2 instance and then convert that image to a VM image, storing the VM image in S3. There may be some advantages to this, but I do not see them.
I think you have two options, both of them very reasonably priced. If you can separate the data from the operating system, then your best option would be to use an S3 bucket as a file system within the EC2 instance. Your EC2 instance would use this bucket to store all your "ML algorithms and data" and, possibly, even your "web-based interface". Whenever you decide that you no longer need the processing capacity of the EC2, you would unmount the S3 bucket file system from the EC2 instance and terminate that instance. After configuring an appropriate lifecycle rule for the S3 bucket, it would transition to Glacier, or even Glacier Deep Archive [you must considerer the different options of long term storage]. In the future, whenever you want to work with your data again, you would move your data from Glacier back to S3, create a new EC2 instance, install your applications, mount your S3 bucket as a file system and you would have access to all your data. I think this is your least expensive and shortest recovery time objective option. To implement this option, look at my answer to this question; everything you need to use an S3 bucket as a regular folder inside the EC2 instance is there.
The second option provides an integrated solution, meaning the operating system and the data stay together, and allows you to restore everything as it was the day you stopped processing your data. It's made up of the following cycle:
Shutdown your EC2 and make a note of all the specs [you need them further down].
Export your instance to a virtual image, vmdk for example, and store it in your S3 bucket. Something like this:
aws ec2 create-instance-export-task --instance-id i-0d54b0682aa3998a0
--target-environment vmware --export-to-s3-task DiskImageFormat=VMDK,ContainerFormat=ova,S3Bucket=sm-vm-backup,S3Prefix=vms
Configure an appropriate lifecycle rule for the S3 bucket so that it transitions to Glacier, or even Glacier Deep Archive.
Terminate the EC2 instance.
In the future you will need to implement the inverse, so you will need to restore the archived S3 Object [make sure you you can live with the time needed by AWS to do this]
Import the virtual image as an EC2 AMI, something like this [this is not complete - you will need some more options that you saved above]:
aws ec2 import-image --disk-containers
Format=ova,UserBucket="{S3Bucket=sm-vm-backup,S3Key=vmsexport-i-0a1c382e740f8b0ee.ova}"
Create an EC2 instance based on the image and you're back in business.
Obviously you should do some trial runs and even automate the entire process if it's something that will be done frequently. I have a feeling, based on what you said, that the first option is a better option, provided you can easily install whatever applications they use.
I'm assuming that you launched an EC2 instance from a base Amazon Machine Image and then added your own software and models to it. As opposed to launched an EC2 instance from an AWS Marketplace offering.
The simplest thing to do is to create an Amazon Machine Image (AMI) from your running EC2 instance. That will capture the current state of the instance and persist it in your AWS account. Then you can terminate the instance. Later, when you want to recreate it, launch a new instance, selecting the saved AMI instead of a standard AMI.
An alternative is to avoid the need to capture machine state at all, by using standard DevOps practices to revision-control everything you need to recreate the state of a running machine.
Note that there are costs associated with an AMI, though they are minimal ($0.05 per GB-month of data stored, for example).
I had contacted AWS customer care regarding this issue. Given below is the response I received. Please add your comments on which option might be good for me.
Note: I acknowledge the AWS customer care team for their help.
I understand that you require some information on cost saving for your
Instance since you will not be utilizing the service for a while.
To assist you with this I would recommend checking out the Instance
Stop/Start link here:
==>https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Stop_Start.html .
When you stop an Instance, you do not lose any data & you are not
charged for the resources any further. However please keep in mind
that you will still be charged for any EBS Storage Volumes attached to
the stopped Instance(s).
I also recommend checking out the below links on how you can reduce
your costs.
==>https://aws.amazon.com/premiumsupport/knowledge-center/reduce-aws-bill/
==>https://aws.amazon.com/blogs/compute/10-things-you-can-do-today-to-reduce-aws-costs/
That being said, please note that as I am in the billing department,
for the best assistance with the various plans you will require the
assistance of our Sales Team.
The Sales Team will be able to assist with ways to save while
maintaining your configurations.
You will be able to reach the Sales Team here:
==>https://aws.amazon.com/websites/contact-us/.
Once you have completed the details in the link, a member of the team
will be in touch with you at their soonest.

Revert Amazon EC2 instance to snapshot?

Is there a convenient way to roll-back an EC2 instance to a previously saved snapshot in the same manner that you can do so with VMWare and other virtualisation platforms. In my investigations so far, it seems you have to deploy a new instance and select the snapshot as the starting volume.
I am doing a lot of testing with new EC2 instance initialisation scripts at present, and having to configure and deploy a new instance for every test is tedious and costly. If I can roll-back to a snapshot of the initial state of the system quickly, this would save a lot of time and effort.
The answers by John and Stefan are both correct. There's no way to trigger a simple "Roll this EC2 instance back to an earlier snapshot" feature on AWS.
There is a way to "roll back" an instance's filesystem to a snapshot by restoring the snapshot to a new EBS volume, detaching and deleting the old one, and attaching the new one.
And, of course, AWS is eminently automatable. You could definitely write your own automation to make that happen.
Having said all of that, if you're trying to test instance creation scripts, I have to agree with John, tearing down and rebuilding the instance is the most reliable way to make sure you're testing it accurately, and shouldn't really be more costly than restoring to a snapshot.
The other path you might consider, particularly if you want the instance to start in a known state that doesn't match a particular predefined AMI, is to build an AMI of your own (e.g. w/ Packer) and use that as the basis for your test. Then instead of restoring to a snapshot, you're creating a new instance from an AMI you've prepared.
No. There is no concept of "roll-back" with Amazon EC2.
If you are using Amazon Linux, deploying a new instance shouldn't be costly. It is charged per-second. You can script it so it isn't so tedious.
Simple answer is no.
If your EC2 instance is backed by an EBS volume however you could create a new volume from the snapshot, detach the old volume and reattach the new one.

How to properly configure a web application instance with autoscaling?

Last day I wanted, according to AWS recommendations, put my ec2 instance inside of an autoscaling group. I created my ec2 instance by using the standard linux AMI instance and then I installed a full LAMP server.
The next morning I tried accessing my apache and guess what? My LAMP wasn't there anymore! Everything was wiped away.
I guess this is because, for some reason, the autoscaling group deleted my instance and recreated it vanilla.
Now I still want to autoscale my instance but, of course, I want to keep my LAMP and the stored data.
So here's my questions:
How to create a customized image starting from my actual instance?
Would it be correct to create the mysql DB using AWS RDS so to not keep it linked to my instance?Is it more or less expensive than dedicating a EBS storage?
I also want to keep my /var/www/html data somewhere shared between instances: while it is true that, on production, I won't update those files often it is also true that I don't want to lose them just because the autoscaling resets my instance state. I also don't want to re-create an image each time I update said files... What's the best way?Maybe an s3 bucket? Or, still, an EBS storage shared between instances?
I would assume that the reason that your "LAMP [server] wasn't there anymore" was because the web server failed health checks and was terminated and replaced by AutoScaling.
Elastic Beanstalk would be a good way to manage some of the complexity here. If that's not an option then you should read up on AutoScaling, ALB, and health checks.
In response to your specific questions:
you can create an Amazon Machine Image (AMI) from an instance. When you, or AutoScaling, launch a new instance from that AMI, you can get the instance up to date by running a script in userdata
move the DB from the web/app server to RDS, or to a DB server that you manage yourself
maintain the html/js/css etc. in S3 and sync them to your web server periodically (there are other options, but that's simple)

I need help duplicating Amazon AWS EC2 instances

I'm just getting started with AWS EC2 and not entirely sure I understand it.
From what I've read, an instance is basically a virtual server, and you should be able to somehow "duplicate" that virtual server from the AWS console somehow. Then use Load Balancer or Elastic IP to route requests to one or the other.
The problem comes in when I try to "duplicate" my instance. I tried a million things, but the only thing that got me close was creating an AMI of my current instance then launching an instance from that, but when I did that, the new instance was basically the default server config. None of my files were there.
What am I doing wrong?
You don't really "duplicate" the instance. You more copy it as a "blueprint". Then when you boot an instance later, you can base that instance off of your snapshot or "blueprint".
The ELB can be configured to point at any instance you want, so when you boot a new server off this snapshot/"blueprint" it can be automatically added to the ELB.
Now that is cleared up, to answer the question:
I would make sure to use EBS backed instances. You can find them all over. But not S3 backed. If they EBS backed then the exact volume with all your configs will be there.
I would make sure your instance is configured how you like it and has proper scripts installed for when it boots up. You will want your services started, config files pulled down from repositories, etc. The config files should be there, but I would not rely on that. If you have them in a repository and then make a startup script to pull them down and copy them where you want, you will be in much better shape.
With the instance running and selected, click on the instance actions drop down and click "Create AMI"
The instance will REBOOT. So be careful.
Launch a new instance. And pick the AMI/Snapshot that #3 created.
Done. Check this https://stackoverflow.com/a/8919031/667608 that could help with the above.
Oh, one other thing, if you have any EBS Volumes attached, they will also be copied, but you will need to mount them once the server boots.
Under instances, click on the image you want to duplicate and then go to instance action(its near the top) and create ami.
This creates a snapshot of your image as it is right now. Then when you need to add more power, you can simply launch that ami and have the load balancer distrubute the traffic between those ami's.
On a side note, unless really required, I would not suggest you store data on the ami if its changing and you plan to use it on another launched ami. You'll pretty much have to keep taking ami snapshots to update it with the new data, so instead figure a way to maintain state somewhere else(not sure about your data but you can consider a database, s3, or another server that these servers can mount to get the same data).
Hope that helps!
Create a webserver AMI using EBS backed instance. This will serve as your template for running multiple web-server instance later.
For the app codes, depending on your strategy and amount of files to transfer, you can pull them from S3 or git or maybe using a centralized filesystem such as NFS.
Configure the ELB, add one or more web server instances to it. CNAME your ELB's public dns to your www.domain.com.