AWS EC2 get the AMI size - amazon-web-services

AWS EC2 get the AMI size - amazon-web-services

I am Using ec2 and creating a custom AMI using a snapshot for my machine. I Can't find how to get it's actual size on disk to calculate the cost it will charge me to store it.
How can I get the actual compressed size or estimated cost for storing an AMI (snapshot of a machine)

AMI sizes are not easy to calculate. This is because they are based on Amazon EBS snapshots, and EBS snapshots are incremental in nature.
For example:
Let's say you launch an instance from an Amazon Linux AMI
You then login and create a file on the disk
You then create a new AMI
Any blocks that have been added or changed (eg for the new file on the disk) will be included in your AMI (or, more accurately, in the snapshot that stores the AMI data). However, all the blocks that were not modified from the original AMI are not stored in your AMI. Instead, the AMI will contain a reference to the blocks in the original AMI. This is due to the incremental nature of snapshots.
So, the reality is that most of an AMI (most of a snapshot) actually contains pointers to existing data and it therefore is not charged to you. You will only pay for storage that does not already exist.
That's why you can't really get the storage size of an AMI.
The only way to know the storage size for sure would be to create a totally new AMI that is not based on an existing AMI, since you would then be charged for the total size. (I wouldn't recommend doing so.)

Related

Why AMI creation is taking long time?

I am trying to create a AMI from an instance with a root device of 160GB in size. This root volume is of type io1 with an iops of 1250.
In my AWs account, creating an AMI takes about 5 minutes. This is with data about more than 100GB.
On the customer's AWS account, the same configuration takes over 20+ minutes.
I have tested this with many repetitions and I get almost similar results all time.
Any idea why the AMI creation varies so much between multiple AWS accounts?

An AMI consists of snapshots of Amazon EBS volumes attached to the instance.
Snapshots consist of "differences" from the previous snapshot (including the original AMI that was used to launch the instance).
For example, if you were to launch a new instance from an AMI and then immediately create a new AMI from the instance, very little data would have changed on the disk volume. Thus, the AMI and its underlying snapshot would be very quick to create.
If, over time, a lot of information was added/modified on the disk volume(s), then creating an AMI will take longer because more disk blocks have changed.
Creating Snapshots and AMIs can be made faster by taking more frequent snapshots, since this will copy modified blocks to Amazon S3. Thus, each successive snapshot/AMI will require fewer blocks to be copied.
The speed of a snapshot/AMI is not impacted by the assigned IOPS to a volume. The snapshot process takes place in the back-end, which does not consume the IOPS allocated to a volume.

Changes required to create AMI from OS disk EBS volume manually

I have a VMware VM whose OS raw disk is backed up to AWS S3. I can create AMI from the OS disk raw using import-image. I cannot use import-image everytime because it is extremely slow and because I am creating an application where you can backup your VM to AWS cloud, where in the first backup will be FULL backup which will take longer, but the consequent INCREMENTAL backup should take very less time(depending on the amount of data changed).I am creating AMI during every backup i.e. FULL or INCREMENTAL backup.
Hence, it is OK and explainable that FULL backup is taking time but for INCREMENTAL it should take less time.
The problem is, while creating AMI from RAW data during incremental backup, AWS is not aware that there is already an AMI (and also corresponding EBS snapshot) created during FULL backup which should be used(or compared) with latest data to find data changes and hence should create AMI out of the changed data only, which will in turn take less time.
So, I have following options :
1) import-snapshot API = converts the raw OS disk to EBS snapshot file.
2) copy OS Disk data = create a EBS volume and attach it to a running EC2 instance. Then copy all OS disk raw data to the volume. Then create snapshot from the EBS volume. From the EBS snapshot, we can create AMI.
I have tried both options but everytime I try to launch EC2 instance from the AMI, I get below error :
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,0)
After going through various forums, I came to know that the above error occurs if there is mismatch in AKI and ARI while creating AMI from snapshot. Correct AKI and ARI is fetched from source EC2 instance from which the snapshot is created (as this is expected by AWS).
In my case, I have not created snapshot from a running EC2 instance but from a VMWare VM OS disk.
I figured out that import-image API also creates snapshots while creating AMI. So, I compared the snapshot created by import-image and the snapshot created by me using option-1 and option-2.
I compared the list of files in /boot/ and their md5sum. I found out the snapshot created by AWS import-image API has "initramfs-3.10.0-327.36.3.el7.x86_64.img.vmimport" file and has modified many files in /boot/grub2 directory.
As per AWS documentation, https://docs.aws.amazon.com/vm-import/latest/userguide/vm-import-ug.pdf,
AWS modifies filesystem :
- installs Citrix PV drivers either directly in OS or modifies initrd/initramfs to contain them,
- modifies /etc/fstab,
- modifyies grub bootloader settings such as the default entry and timeout.
So, do I need to do above changes in my EBS volume as well ? How to do these changes (code, script, tool, etc) ?
Please suggest any better option if someone has.
I explored Packer but found out that it needs source_ami to create AMI, hence not applicable to me as I am not creating AMI from source AMI, Please correct me if I am wrong.

Where OS and its settings are stored in EC2 instance

Using ec2 Windows instance with Instance storage (let's say 32GB SSD) - where OS and its settings are stored? Like Program Files, User profiles. Are they all stored on Instance Storage? As far as I understood from other topics Instance storage is not-persistent and doesn't survive shutdowns/terminations. Does that mean I will lose everything under C: drive if I turn it off?
Can I use EBS storage as a default storage for OS (C drive)? Can I map multiple EBS storages to one Windows storage?
If above is true, then I will be charged for the capacity used by OS on EBS instance? It would be around 20GB I believe. Is that correct?
I am quite new in aws, and before paying for such instances or EBS I would like to know how this technical and billing model is working.
Thank you!

The Storage for the Root device is dependent on the AMI (EBS-Backed or Instance Store-Backed) used to launch the instance.
As far as I understood from other topics Instance storage is
not-persistent and doesn't survive shutdowns/terminations.
If the Root storage device is Instance Store, Stopping (shutdown) the instance is not possible. On termination, Both the storage and Instance does not survive. The Instance does not survive once terminated even if the AMI is EBS-Backed, but you can persist the Root Volume by setting the DeleteOnTermination flag set to False.
Does that mean I will lose everything under C: drive if I turn it off?
You cannot turn off (shutdown) an Instance Store-backed instance.
Can I use EBS storage as a default storage for OS (C drive)?
Yes, Choose an EBS backed Windows AMI.
Can I map multiple EBS storages to one Windows storage?
Yes, multiple EBS Volumes can be attached to one EC2 Windows Instance.
If above is true, then I will be charged for the capacity used by OS
on EBS instance?
You will be charged for the total size of the EBS volumes attached to the instance including the Root Device.
It would be around 20GB I believe. Is that correct?
The EBS Volume Size is adjustable. The upper Size limit is 16TiB.
Read Storage for Root Device and Ec2 Root Device Volume

Please spend more time on the AWS documentation, I don't think here is enough to cover all your question.
Only for specify EC2 instance come with attached SSD storage AKA instance storage. Bare in mind that, this instance storage doesn't come with Snapshot capabilities, so you must backup the file yourself. This is mean for people who need fastest disk access to process their data.
Only EBS allow you do multiple snapshot.
You can always create an AMI image for your instance after complete the deployment. AMI image is store inside EBS, so you will not lost the initial instance if you do this, so for new instance, you just trigger load it from AMI.
If you "Terminate" an instance, it will delete the virtual image. There is no way to recover it even with EBS, unless you make a snapshot. However, attached EBS storage will not be deleted.
EBS is calculate by Per GB and give you 1GB x 3 IOPS, with base 100 IOPS given. This is not enough if anyone want to carry out disk I/O intensive task.

How AMI to S3 costs

We a script to create couple of AMI, On successful completion of AMI it deletes the old AMI. As of my understanding AWS only charges for the space we use in S3 for storing snapshot that was created by an AMI.
If I created two AMIs for an instance on different dates(those two AMIs create two different snapshots). will they charge for only new snapshot size? or for the two snapshots?
How AWS charges for this process?

An Amazon Machine Image (AMI) is actually a standard EBS snapshot, with additional metadata.
EBS snapshots are incremental in nature, meaning that only blocks that have been added or changed are copied to Amazon S3.
This means that successive snapshots could incur very little additional cost. Imagine this scenario:
AMI 1 is created from an instance (or, more accurately, from the EBS volumes associated with the instance)
Some data is changed on the EBS volumes
AMI 2 is created from the instance
Since each AMI is a snapshot, and snapshots only copy incremental data to S3, then the additional cost for AMI 2 would only be the new/modified blocks.
If AMI 1 were then deleted, the cost drop would be minimal, since most blocks contained in the AMI 1 snapshot would be kept for the AMI 2 snapshot.
One interesting result of all this (which is merely my suspicion, since I could not find any official statement to this effect) is that, if your AMI is based off an existing volume (eg an AMI from Amazon), then any snapshots/AMIs created of that volume will actually inherit blocks that are part of the original snapshot. Therefore, you (probably) do not pay the full cost of storing that AMI since the snapshot points to blocks already in a snapshot originally created by Amazon. (Don't worry if you didn't understand all that!)

AMIs are stand alone in nature and treated individually no matter whether created out of the same instance on the same day. So they charges of the AMI storage would be calculated 2 times.
For the record the AMI creation involves the snapshotting behind the scenes and AWS performs those snapshots calculating the delta; so the AMI creation process would faster the second time but still they are treated a 2 individual copies.

Amazon EC2 EBS backup: AMI vs Snapshot

I am trying to create a backup mechanism for our server, so that if my system crashes, I should be able to create the whole system by running a single script
After going through Amazon documentation, this is my understanding of creating a backup and restoring
Backup
Create a AMI Image (this can be updated monthly)
Create a snapshot (This can be done using a daily script creating a snapshot)
Restore (A script to)
Create an EBS instance using AMI
Attach the EBS volume to Instance created
Now my Questions are
Is it the best way to take a backup and restore?
Do we actually need to backup 2 things, AMI and EBS volume (using snapshot), Can we just keep snapshots?
I understand this cannot work for a local instance store instance, as there is no snapshot functionality. So how can I create a backup and restore process for local instance store instances?

As I could not find any better alternative, I am sticking with the initial approach.
For EBS
Backup:
Create a AMI Image (this can be updated monthly).
Create a snapshot (This can be done using a daily script creating a snapshot).
Restore (A script to)
Create an EBS instance using AMI.
Attach the EBS volume to Instance created.
For instance store, I am only keeping the application (no database), so no need to keep a backup of that.

EBS Snapshots are an excellent way to create backups.
You can perform frequent Snapshots of your EBS Volumes via scripts. Weekly, Daily, Hourly, or as frequently as your Credit Card will allow. The only limit is around how many simultaneous snapshots you can be doing - when you hit that, the EBS API will start giving back errors until a few of the in-flight operations complete.
Snapshots can also be copied from Region to Region in order to provide backup against a catastrophic event.
When you snapshot an EBS volume, that snapshot is of the entire volume. Even if it was created from an AMI, your snapshot contains everything you need to create a new instance of the volume. You can pretty easily try this yourself.

If your instances are Linux based, there is no need to create an AMI if you're taking snapshots. You can create the AMI on the fly, from the snapshots, when you need to recover. If you got that process automated, it's pretty easy to do.
In Windows there is a limitation not allowing to launch an EC2 instance from a snapshot, so AMIs must be used. There are ways to workaround that limitation: You can check out the this post I wrote in our company's blog:
http://www.n2ws.com/blog/3-ways-ec2-windows-backup-and-recovery.html

I would suggest to use Auto Scaling in addition to EBS snapshots. If Instance is dying because of Hardware failure or it's scheduled for retirement by Amazon, Auto Scaling will start new Instance automatically.
But in this case, you have to setup NAS for your dynamic data. Depending on Server Load, the number of running Instances will be different and all your scaling servers must mount NAS storage which is shared across them.
Your Database should be on separate server or servers as well. Or you might want to use Amazon RDS as it has great auto-backup / Point-In-Time-Restore features, but you have to pay extra for that.

1) Yes.Snapshot is best way to backup and restore EBS volumes.
2) Depends, if you have the root volume as EBS backed AMI, then you can snapshot them as well and improves the manageability
3) Rsync and AMI is the option available for instance store

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js