AWS Storage Types - amazon-web-services

I am trying to understand the fundamental differences between several different types of storage available on AWS, specifically:
SSD
Magnetic
"Provisioned IOPS"
Snapshot storage
I am stunned to find no clear definition of each of these in the AWS docs, so I ask: How are these storage types different, and what use cases/scenarios are appropriate for each?

You're referring to Elastic Block Store (EBS). EBS provides persistent block level storage volumes for Amazon EC2 instances. EBS volumes come in 3 types:
Provisioned IOPS (SSD)
General Purpose (SSD)
Magnetic
Each type has different performance characteristics and costs. See EBS volume types for more details. The list above is ordered from high to low, by both price and by potential IOPS.
EBS snapshots are something else entirely. All EBS volumes, regardless of volume type, can be snapshotted and durably stored.

Instance storage options:
Magnetic - Slowest/cheapest magnetic disk backed storage
SSD - Faster/more expensive solid state backed storage
"Provisioned IOPS" - FastEST/most expensive but guaranteed (at the physical level) speed of input/output operations per second.
from Google:
IOPS (Input/Output Operations Per Second, pronounced eye-ops) is a common performance measurement used to benchmark computer storage devices like hard disk drives (HDD), solid state drives (SSD), and storage area networks (SAN).
This link has more fine grained details on SSD/Magnetic disk comparisons, granted it seems geared towards databases.
Snapshots are backups and are entirely separate from AWS 'hard drive' offerings.

Related

Amazon RDS cost savings suggestion

I'm looking to reduce my production RDS cost by converting from GP2 to magnetic. Here's an IOPS graph to show how little IOPS my RDS is consuming versus the amount of storage allocated (2TB).
The RDS has an instance type of t2.xlarge with multi-AZ, so you can imagine the amount of workload for this kind of instance. Questions I would like to ask:
Is there any real world cases where magnetic storage is used for similar workload like this?
Assuming it's fine to use magnetic storage for production RDS, I understand that IOPS will be depleted during conversion. However since I have multi-AZ enabled, will the conversion take place at the standby RDS, before it's promoted to become the new master RDS?

AWS EBS block size

Can you point me to some resources on how EBS works behind the scenes for gp2 volumes?
The way I understand it, it is a service, but really it is some form of connecting arrays of SSD drives to the instance, in a redundant way
What is the actual, physical method of connecting?
THe documentation refers to the fact that data is transferred in 16KB or 256KB blocks, but I can't find any more about that.
If for example, in Linux, my partition is formatted with 4KB blocks, does this mean that EBS will transfer data to and from disk with 16KB block, if so wouldn't it make sense to also format the partition with 16KB block and also optimise it upstream?
If I have a set of very random 4k operations, will this trigger the same amount of 16KB block requests?
If anyone's done such testing already, I'd really like to hear it...
The actual, physical means of connection is over the AWS software-defined Ethernet LAN. EBS is essentially a SAN. The volumes are not physically attached to the instance, but they are physically within the same availability zone, the access is over the network.
If the instance is "EBS Optimized," there's a separate allocation of Ethernet bandwidth for communication between the instance and EBS. Otherwise, the same Ethernet connection that handles all of the IP traffic for the instance is also used by EBS.
The SSDs behind EBS gp2 volumes are 4KiB page-aligned.
See AWS re:Invent 2015 | (STG403) Amazon EBS: Designing for Performance beginning around 24:15 for this.
As explained in AWS re:Invent 2016: Deep Dive on Amazon Elastic Block Store (STG301), an EBS volume is not a physical volume. They're not handing you an SSD drive. An EBS volume is a logical volume that spans numerous distributed devices throughout the availability zone. (The blocks on the devices are also replicated within EBS within the availability zone to a second device.)
These factors should make it apparent that the performance of the actual SSDs is not an especially significant factor in the performance of EBS. EBS, by all appearances, allocates resources in proportion to what you're paying for the volume... which is of course directly proportional to the size of the volume as well as which feature set (volume type) you've selected.
16KiB is the nominal size of an I/O that EBS uses for establishing performance benchmarks for gp2. It probably has no other special significance, as it appears to be related as much or more to the processing resources that EBS allocates to your volume as to the media devices themselves -- EBS volumes live in storage clusters that have "resources" of their own (CPU, memory, network bandwidth, etc.) and 16KiB seems to be a nominal value related to some kind of resource allocation in the EBS infrastructure.
Note that the sc1 and st1 volumes use a very different nominal I/O size: 1 MiB. Obviously, that can't be related to anything about the physical storage device, so this lends credence to the conclusion that the 16KiB number for gp2 (and io1).
A gp2 volume can perform up to the lowest of several limits:
160 MiB/second, depending on the connected instance type‡
The current number of instantaneous IOPS available to the volume, which is the highest of
100 IOPS regardless of volume size
3 IOPS per provisioned GiB of volume size
The IOPS credits available for with in your token bucket, capped at 3,000 IOPS
10,000 IOPS per volume regardless of how large the volume is
‡Smaller instance types can't provide 160MiB/second of network bandwidth, anyway. For example, the r3.xlarge has only half a gigabit (500 Mbps) of network bandwidth, limiting your total traffic to EBS to approximately 62.5 MiB/sec, so you won't be able to push any more throughput to an EBS volume than this from an instance of that type. Unless you are using very large instances or very small volumes, the most likely constraint on your EBS performance is going to be the limits of the instance, not the limits of EBS.
You are capped at the first (lowest) threshold in the list above, the impact of the nominal 16 KiB I/O size is this: if your I/Os are smaller than 16KiB, your maximum possible IOPS does not increase, and if they are larger, your maximum possible IOPS may decrease:
an I/O size of 4KiB will not improve performance, since the nominal size of an I/O for rate limiting purposes is established 16KiB, but
an I/O size of 4KiB is unlikely to meaningfully decrease performance with sequential I/Os since, for EBS's accounting purposes, are internally combined. So, if your instance were to make 4 × 4 KiB sequential I/O requests, EBS is likely to count that as 1 I/O anyway
an I/O size of 4KiB and extremely random I/Os would indeed not be combined, so would theoretically perform poorly relative to the same number of 16KiB extremely random I/Os, but instinct and experience tells me this borders on academic and theoretical territory except perhaps in extremely rare cases. It could just as likely hurt as help, since small writes would use the same number of IOPS but transfer more unnecessary data across the wire.
if your I/Os are larger than 16KiB, your maximum IOPS will decrease if your disk bandwidth reaches the 160MiB/s threshold before reaching the IOPS threshold.
A final thought, EBS performs best under load. That is to say, a single thread making a series of random I/Os will not keep the EBS volume's queue filled with requests. When that is not the case, you will not see the maximum possible performance.
See also Amazon EBS Volume Performance on Linux Instances for more discussion of EBS performance.

Why can't I use the new st1/sc1 EBS volumes by AWS as root volumes

AWS launched sc1 and st1 HDD EBS volume types recently, I can't seem to use these as root volumes while launching new EC2 instances or launching from already created AMI's (tried both).
I chose an m4 machine, in any case, the root volume is EBS itself, below is a screenshot, the second volume that I add gets the new options, however the first one I can't choose the same. Is this by design AWS people?
If you look from http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html
under the main table for volume type as Throughput Optimized HDD (st1) and Cold HDD (sc1) it says
Cannot be a boot volume
and below
Throughput Optimized HDD (st1) volumes provide low-cost magnetic
storage that defines performance in terms of throughput rather than
IOPS. This volume type is a good fit for large, sequential workloads
such as Amazon EMR, ETL, data warehouses, and log processing. Bootable
st1 volumes are not supported.
and
Cold HDD (sc1) volumes provide low-cost magnetic storage that defines
performance in terms of throughput rather than IOPS. With a lower
throughput limit than st1, sc1 is a good fit ideal for large,
sequential cold-data workloads. If you require infrequent access to
your data and are looking to save costs, sc1 provides inexpensive
block storage. Bootable sc1 volumes are not supported.
Because the customer experience would be awful. Boot volumes use small, random I/O; these volumes aren't designed for small I/O. Just use GP2 for boot volumes.

AWS EBS references

Perhaps this isn't so much a code question as a definitional question, but would someone be able to explain to me what the six line items below represent?
EBS has 3 types of storage (in order from most expensive to cheapest):
Provisioned I/O. These are SSD volumes with a performance guarantee. With these volumes you not only pay for the size of the volume, but also the number of IOPS you have provisioned. These volumes should only be used when performance is very important.
General Purpose SSD. These volumes provide improved performance over Magnetic volumes at a somewhat higher cost. Probably the best choice for most general purpose uses.
Magnetic. This type of storage uses magnetic disks and is the cheapest and slowest. Good for bulk data storage that doesn't have any performance requirement.
The other two items not covered by the above volume types are IO requests, which occur any time data blocks are read or written to any volume. Also snapshots are copies of volumes stored on S3.
Amazon Elastic Block Storage is offered in three flavors: Magnetic, PIOPS SSD and General Purpose SSD.
Each flavor will offer different performance and prices, that you can check in the EBS pricing page.
These lines looks like a budget showing how much of each is consumed by your project :)

Do Amazon High I/O instance guarantee disk persistence?

The High I/O instance in EC2 uses SSD. How does one run a database on such an instance while guaranteeing persistance of data?
From my limited understanding, I'm suppose to use Elastic Block Store (EBS) so that even if the machine goes down the data on the disk doesn't disappear. On the other hand the instance storage SSD of a High I/O instance is ephemeral and can't be used for database storage because if, for example, the machine loses power the data image isn't preserved. Is my understanding correct?
Point 1) If your workloads need High IO SSD for DB, then you should have Master Slave setup. Ideally 1 master and 2 slaves spread across 3 AZ's is suggested. Even if there is an outage on single AZ the alternate AZ's can handle the load and serve your High availability needs. Between master - slave you can employ synchronous, semi or async replication depending upon your DB. This solution is costlier.
Point 2) Generally if your DB is OLTP in nature, then Amazon EBS PIOPS + EBS optimized gives you consistent IOPS. A Single EBS Volume can provide 4000 IOPS and you can RAID 0 multiple volumes and gain 10k+ IOPS for performance. Lots of customers are taking this route in AWS. Even though you may use EBS for persistence, it is still recommended to go with Master-Slave architecture for High Availability. I have written detailed articles on this topic in blog, refer them for more information.
It is the same as other ephemeral storage, it does not guarantee persistence. Persistance is handled by replication between instances with at least one instance writing to an EBS volume.
If you want your data to persist, you're going to need to use EBS. Building a database on an ephemeral drive, regardless of performance, seems a dubious design choice.
EBS now offers 4K IOPS volumes, which is, depending on your database requirements, quite possibly more than sufficient.
My next question would really be: Do you want to host/run your own database?
Turnkey products such as RDS and DynamoDB may be sufficient for your needs. Using them is much easier than setting up and managing your own database. RDS is now advertising "You can now provision up to 3TB and 30,000 IOPS per DB Instance". That's enough database horsepower for many, many problem sets.