Why AMI creation is taking long time? - amazon-web-services

I am trying to create a AMI from an instance with a root device of 160GB in size. This root volume is of type io1 with an iops of 1250.
In my AWs account, creating an AMI takes about 5 minutes. This is with data about more than 100GB.
On the customer's AWS account, the same configuration takes over 20+ minutes.
I have tested this with many repetitions and I get almost similar results all time.
Any idea why the AMI creation varies so much between multiple AWS accounts?

An AMI consists of snapshots of Amazon EBS volumes attached to the instance.
Snapshots consist of "differences" from the previous snapshot (including the original AMI that was used to launch the instance).
For example, if you were to launch a new instance from an AMI and then immediately create a new AMI from the instance, very little data would have changed on the disk volume. Thus, the AMI and its underlying snapshot would be very quick to create.
If, over time, a lot of information was added/modified on the disk volume(s), then creating an AMI will take longer because more disk blocks have changed.
Creating Snapshots and AMIs can be made faster by taking more frequent snapshots, since this will copy modified blocks to Amazon S3. Thus, each successive snapshot/AMI will require fewer blocks to be copied.
The speed of a snapshot/AMI is not impacted by the assigned IOPS to a volume. The snapshot process takes place in the back-end, which does not consume the IOPS allocated to a volume.

Related

Which point in time is reflected in the files of an EC2 AMI taken while rebooting?

If you take an AMI from an EC2, and the AMI takes, say, 1 hour to be available; and you choose the option not to skip the reboot.
All the files in the AMI will:
a) reflect their exact condition from the time the EC2 was rebooted? or
b) they may reflect any condition in this 1 hour interval which is what it took for the AMI to be available.
I always considered option a, but I'm not so sure any more, specially after I noticed that when you take an AMI in the console, it gives this message:
"Currently creating AMI ..... Check that the AMI status is 'Available' before deleting the instance or carrying out other actions related to this AMI."
I want to know if it's safe to start applying changes in an EC2 instance after an AMI is requested and the EC2 rebooted, but before the AMI is available.
An Amazon Machine Image (AMI) will contain a copy of the disk at it was at exactly at the point in time when the API call was issued.
Or, if the instance is rebooted as part of the image creation, it will contain a copy of the disk as it was between the time when the operating system shutdown and when the operating system started again.
The time taken for an AMI to become available involves copying disk blocks to the Snapshot used by the AMI. Any disk changes during that time will not be reflected in the AMI. This is possible because the disk is virtual. (It's a bit like a database being able to roll-back due to the use of log files.)
From Create Amazon EBS snapshots - Amazon Elastic Compute Cloud:
Snapshots occur asynchronously; the point-in-time snapshot is created immediately, but the status of the snapshot is pending until the snapshot is complete (when all of the modified blocks have been transferred to Amazon S3), which can take several hours for large initial snapshots or subsequent snapshots where many blocks have changed. While it is completing, an in-progress snapshot is not affected by ongoing reads and writes to the volume... snapshots only capture data that has been written to your Amazon EBS volume at the time the snapshot command is issued.

AWS EC2 get the AMI size

I am Using ec2 and creating a custom AMI using a snapshot for my machine. I Can't find how to get it's actual size on disk to calculate the cost it will charge me to store it.
How can I get the actual compressed size or estimated cost for storing an AMI (snapshot of a machine)
AMI sizes are not easy to calculate. This is because they are based on Amazon EBS snapshots, and EBS snapshots are incremental in nature.
For example:
Let's say you launch an instance from an Amazon Linux AMI
You then login and create a file on the disk
You then create a new AMI
Any blocks that have been added or changed (eg for the new file on the disk) will be included in your AMI (or, more accurately, in the snapshot that stores the AMI data). However, all the blocks that were not modified from the original AMI are not stored in your AMI. Instead, the AMI will contain a reference to the blocks in the original AMI. This is due to the incremental nature of snapshots.
So, the reality is that most of an AMI (most of a snapshot) actually contains pointers to existing data and it therefore is not charged to you. You will only pay for storage that does not already exist.
That's why you can't really get the storage size of an AMI.
The only way to know the storage size for sure would be to create a totally new AMI that is not based on an existing AMI, since you would then be charged for the total size. (I wouldn't recommend doing so.)

EBS Snapshots, who manages backups?

I'm starting with AWS and I've been taking my first steps on EC2 and EBS. I've learnt about EBS snapshots but I still don't understand if the backups, once you've created a snapshot, are managed automatically by AWS or I need to do them on my own.
AWS just introduced a new feature called Lifecycle Manager (in the EC2 Dashboard, at the bottom left) that allows you to create automated backups for your volumes. Once you configure a policy, AWS will handle the backup process for your volumes.
This is only a couple of weeks old so just wanted to mention here.
Snapshots are managed by AWS
snapshot of an EBS volume, can be used as a baseline for new volumes or for data backup. If you make periodic snapshots of a volume, the snapshots are incremental—only the blocks on the device that have changed after your last snapshot are saved in the new snapshot. Even though snapshots are saved incrementally,
the built in durability of EBS is comparable to a RAID in the physical sense. The data itself is mirrored (think more like a RAID stripe though) in the availability zone where the volume exists. Amazon states that the failure rate is somewhere around 0.1-0.5% annually. This is more reliable than most physical RAID setups

AWS EC2 instance snapshot in another region

i m running ec2 instance in 1 region i want to create snapshots of ec2 instances in other region directly without coping and cross region replication in s3, is this possible? if possible then how?
Amazon EBS Snapshots are created in the same region as the original EBS Volume. They can then be used to create a new Volume within the same Region.
If you wish to use an Amazon EBS Snapshot in a different region, the snapshot must first be copied to the other Region. This can done via the Amazon EC2 management console, the AWS Command-Line Interface (CLI) aws ec2 copy-snapshot command, or an AWS API call.
Please note that snapshots are incremental backups. The first snapshot isn't really a full backup. Rather, every snapshot simply copies any blocks that have been modified since any previous snapshot. Blocks are retained while snapshots still require the blocks. This means that blocks made during the initial snapshot could actually be deleted if they are not required by any active snapshots. This is why I say they are not the same as a full backup, which traditionally never has content deleted.
However, when a snapshot is copied to a new region it is copied in full, rather than incrementally.
If you do not with to copy an EBS snapshot between regions, you would need to find a different way to transfer the disk volume (eg filesystem-level synchronisation).
In fact, there should typically be no need to transfer a disk volume -- rather, your systems should be capable of configuring a new server based upon a startup configuration script and data should be stored in a separate database so that it is accessible to multiple instances. It is a very rare case that requires a complete copy of a disk volume.

How AMI to S3 costs

We a script to create couple of AMI, On successful completion of AMI it deletes the old AMI. As of my understanding AWS only charges for the space we use in S3 for storing snapshot that was created by an AMI.
If I created two AMIs for an instance on different dates(those two AMIs create two different snapshots). will they charge for only new snapshot size? or for the two snapshots?
How AWS charges for this process?
An Amazon Machine Image (AMI) is actually a standard EBS snapshot, with additional metadata.
EBS snapshots are incremental in nature, meaning that only blocks that have been added or changed are copied to Amazon S3.
This means that successive snapshots could incur very little additional cost. Imagine this scenario:
AMI 1 is created from an instance (or, more accurately, from the EBS volumes associated with the instance)
Some data is changed on the EBS volumes
AMI 2 is created from the instance
Since each AMI is a snapshot, and snapshots only copy incremental data to S3, then the additional cost for AMI 2 would only be the new/modified blocks.
If AMI 1 were then deleted, the cost drop would be minimal, since most blocks contained in the AMI 1 snapshot would be kept for the AMI 2 snapshot.
One interesting result of all this (which is merely my suspicion, since I could not find any official statement to this effect) is that, if your AMI is based off an existing volume (eg an AMI from Amazon), then any snapshots/AMIs created of that volume will actually inherit blocks that are part of the original snapshot. Therefore, you (probably) do not pay the full cost of storing that AMI since the snapshot points to blocks already in a snapshot originally created by Amazon. (Don't worry if you didn't understand all that!)
AMIs are stand alone in nature and treated individually no matter whether created out of the same instance on the same day. So they charges of the AMI storage would be calculated 2 times.
For the record the AMI creation involves the snapshotting behind the scenes and AWS performs those snapshots calculating the delta; so the AMI creation process would faster the second time but still they are treated a 2 individual copies.