AWS EBS IOPS, what happens when it gets exceeded

AWS EBS IOPS, what happens when it gets exceeded - amazon-web-services

Assuming I have a gp3 SSD backed EBS with 3000 IOPS with block size of 256kb & througput of 128Mbps ( EBS is not optimized )
What happens to my IO requests if I exceed the 3000 IOPS request? Do they stall? Or is that more of a function of throughput?
Let's take an example
An application does 3000 random IO (so no chance of merging) of 1kb size per second. This means I will be hitting the IOPS limit but throughput is only 3Mbps. Will this cause my further IO operations to stall?
Is it preferable to increase my IO size in that case?

EBS will throttle your I/O in this case and you will not see more than 3000 IOPS. In general, they have throttling at EBS volume level and at EC2 instance level (max/baseline EBS IOPS of EC2 instance type, limit across all attached EBS volumes). You will be throttled as soon as you hit any of those limits.

Related

AWS EFS Throughput not exceeding 1MB

I hope you can assist with a question around EFS throughput that I am seeing on our AWS services.
I have setup an EFS filesystem which I have mounted on 2 x EC2 instances. Originally, I had it configured as burstable throughput.
When looking at the MeteredIOBytes metric, the maximum throughput never exceeds 1.05M. I thought it might be because I am not using much space on EFS and as I understand it, when on burstable throughput, the throughput increases as you use more GBs.
I then tried changing the EFS to provisioned throughput of 5MB, but even after doing this, the MeteredIOBytes metric never exceeds 1.5MB.
efs1
In both burstable and provisioned mode, when comparing the maximum MeteredIOBytes with PermittedThroughput, the MeteredIOBytes is way below Permitted.
efs2
I am only using 3GB of storage space on EFS.
Is there a setting I am missing to increase the throughput beyond 1.5MB?
Thanks

AWS EBS block size

Can you point me to some resources on how EBS works behind the scenes for gp2 volumes?
The way I understand it, it is a service, but really it is some form of connecting arrays of SSD drives to the instance, in a redundant way
What is the actual, physical method of connecting?
THe documentation refers to the fact that data is transferred in 16KB or 256KB blocks, but I can't find any more about that.
If for example, in Linux, my partition is formatted with 4KB blocks, does this mean that EBS will transfer data to and from disk with 16KB block, if so wouldn't it make sense to also format the partition with 16KB block and also optimise it upstream?
If I have a set of very random 4k operations, will this trigger the same amount of 16KB block requests?
If anyone's done such testing already, I'd really like to hear it...

The actual, physical means of connection is over the AWS software-defined Ethernet LAN. EBS is essentially a SAN. The volumes are not physically attached to the instance, but they are physically within the same availability zone, the access is over the network.
If the instance is "EBS Optimized," there's a separate allocation of Ethernet bandwidth for communication between the instance and EBS. Otherwise, the same Ethernet connection that handles all of the IP traffic for the instance is also used by EBS.
The SSDs behind EBS gp2 volumes are 4KiB page-aligned.
See AWS re:Invent 2015 | (STG403) Amazon EBS: Designing for Performance beginning around 24:15 for this.
As explained in AWS re:Invent 2016: Deep Dive on Amazon Elastic Block Store (STG301), an EBS volume is not a physical volume. They're not handing you an SSD drive. An EBS volume is a logical volume that spans numerous distributed devices throughout the availability zone. (The blocks on the devices are also replicated within EBS within the availability zone to a second device.)
These factors should make it apparent that the performance of the actual SSDs is not an especially significant factor in the performance of EBS. EBS, by all appearances, allocates resources in proportion to what you're paying for the volume... which is of course directly proportional to the size of the volume as well as which feature set (volume type) you've selected.
16KiB is the nominal size of an I/O that EBS uses for establishing performance benchmarks for gp2. It probably has no other special significance, as it appears to be related as much or more to the processing resources that EBS allocates to your volume as to the media devices themselves -- EBS volumes live in storage clusters that have "resources" of their own (CPU, memory, network bandwidth, etc.) and 16KiB seems to be a nominal value related to some kind of resource allocation in the EBS infrastructure.
Note that the sc1 and st1 volumes use a very different nominal I/O size: 1 MiB. Obviously, that can't be related to anything about the physical storage device, so this lends credence to the conclusion that the 16KiB number for gp2 (and io1).
A gp2 volume can perform up to the lowest of several limits:
160 MiB/second, depending on the connected instance type‡
The current number of instantaneous IOPS available to the volume, which is the highest of
100 IOPS regardless of volume size
3 IOPS per provisioned GiB of volume size
The IOPS credits available for with in your token bucket, capped at 3,000 IOPS
10,000 IOPS per volume regardless of how large the volume is
‡Smaller instance types can't provide 160MiB/second of network bandwidth, anyway. For example, the r3.xlarge has only half a gigabit (500 Mbps) of network bandwidth, limiting your total traffic to EBS to approximately 62.5 MiB/sec, so you won't be able to push any more throughput to an EBS volume than this from an instance of that type. Unless you are using very large instances or very small volumes, the most likely constraint on your EBS performance is going to be the limits of the instance, not the limits of EBS.
You are capped at the first (lowest) threshold in the list above, the impact of the nominal 16 KiB I/O size is this: if your I/Os are smaller than 16KiB, your maximum possible IOPS does not increase, and if they are larger, your maximum possible IOPS may decrease:
an I/O size of 4KiB will not improve performance, since the nominal size of an I/O for rate limiting purposes is established 16KiB, but
an I/O size of 4KiB is unlikely to meaningfully decrease performance with sequential I/Os since, for EBS's accounting purposes, are internally combined. So, if your instance were to make 4 × 4 KiB sequential I/O requests, EBS is likely to count that as 1 I/O anyway
an I/O size of 4KiB and extremely random I/Os would indeed not be combined, so would theoretically perform poorly relative to the same number of 16KiB extremely random I/Os, but instinct and experience tells me this borders on academic and theoretical territory except perhaps in extremely rare cases. It could just as likely hurt as help, since small writes would use the same number of IOPS but transfer more unnecessary data across the wire.
if your I/Os are larger than 16KiB, your maximum IOPS will decrease if your disk bandwidth reaches the 160MiB/s threshold before reaching the IOPS threshold.
A final thought, EBS performs best under load. That is to say, a single thread making a series of random I/Os will not keep the EBS volume's queue filled with requests. When that is not the case, you will not see the maximum possible performance.
See also Amazon EBS Volume Performance on Linux Instances for more discussion of EBS performance.

AWS t2.medium performance issues after adding 32 gb volume

On AWS EC2 t2.medium instance we run a site http://www.pricesimply.com/
Our database is installed on the same machine.
By default we had 8 gb storage and the site speed was lightning quick.
Then, we added a 32 gb volume Type - general purpose volume.
Only difference between these 2 volumes -
8 gb default volume - IOPS 24/3000
32 gb new added volume - IOPS 96/3000
Volume type & Availability zones are same.
The site MUCH SLOWER now compared to earlier.

Some random ideas:
1) Performance does vary between volumes. Do some benchmarks to see if it's really slower. (Unlikely, but possible.)
2) Perhaps the volume is a red herring -- Maybe your entire dataset was small enough to fit into RAM before, and now that you've expanded and grown, your data doesn't fit, creating constant I/O?
3) If the drive was created from a snapshot, it may be fetching your data from your snapshot in the background, slowing the drive.

Adding an additional disk shouldn't slow down your machine, you'll need to investigate things a bit more to identify the bottleneck.
Poor performing infrastructure generally fits into one of three categories:
CPU: Check your CPU utilization to see if the t2.medium instance is a suitable size. Amazon CloudWatch can show you CPU history.
Memory (RAM): Your application may be short on memory, causing page swaps to disk. You'll need to monitor memory utilization from within your instance. (CloudWatch cannot see memory utilization.)
Disk IO: If you are reading & writing to disk a lot, then this could be your bottleneck. CloudWatch can give you some metrics, especially the Queue Length, which indicates that IO was waiting to be processed.
Once you've identified which of these three factors appears to be the bottleneck, try to improve them:
CPU: Use a larger instance type
Memory: Use an instance type with more RAM
Disk: Use a faster disk
You are using General Purpose (SSD) EBS volumes. These volumes have an IOPS (Input/Output per second) related to volume size. So, your "96/3000" volume gives a guaranteed 96 IOPS (about the speed of a magnetic hard disk) with the ability to burst up to 3000 IOPS if you have enough IO 'credits'. If you are continually using more than 96 IOPS, you will run out of credits and will be limited to 96 IOPS.
See: Amazon EBS Volume Types

Same storage&IOPS, why provisioned IOPS ssd cost more than General ssd?

From the aws ec2 CALCULATOR(http://calculator.s3.amazonaws.com/index.html):
General purpose ssd
1000G storage,
3000 IOPS,
$97/month
Provisioned IOPS ssd
1000G storage,
3000 IOPS,
$320/month
My questions
If I attach 1000G General ssd to a ec2 instance, and used 100G, what IOPS I really get? 3000 or 300?
If question1 is 3000, in what conditions we should use provisioned IOPS ssd, while we can increase IOPS by adding storage at lower cost?

The baseline performance is 3 IOPS per GB of storage and can burst up to 3000 IOPS per volume. However the burst probably wont come into play. Since baseline for 1000GB volume is 3000 IOPS.
When you need more than 3000 IOPS. With provisioned you can get up to 30 IOPS per GB.
The instance size you attach these volumes to will also affect performance.

Mysql replication & Amazon EC2 SSD vs. Magnetic Disk

We have a replication setup on Amazon EC2 that used Magnetic disks (15GB) that we swapped with SSD disks (15GB) for each the replication servers. We noticed that the slaves would fall behind the master and never catch up with these new SSD disks. This is something that never happened with the Magnetic disks but happened on each and every SSD disk.
We decided to try and move the databases back to Magnetic disks after the SSD disks fell more than 2 days behind. Within 2 hours the slave completely caught up.
I thought that SSD disks were more efficient, and all around better than Magnetic disks and that is why Amazon decided to make them standard.
Another bit of information is that we are using Micro instances, but the only changes we made was with the attached disk.
Anyone have any ideas?

I think you might be bumping up against the max IOPS for your 15GB SSD drive.. Amazon only allows an average of 45 IOPS for that disk size. On a magnetic drive, they have an extra charge when you use more IOPS, it does not seem to be throttled in the same way:
From Amazon's Pop Up on IOPS:
The number of input-output operations per second. For Provisioned IOPS (SSD) volumes, you can specify the IOPS rate when you create the volume. The ratio of IOPS provisioned and the volume size requested can be a maximum of 30 (in other words, a volume with 3000 IOPS must be at least 100 GB). General Purpose (SSD) volume types have a baseline IOPS of volume size X 3 and can burst up to 3000 IOPS for 30 minutes.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js