Background
I have an AWS managed Elascsearch v6.0 cluster that has 14 data instances.
It has time based indices like data-2010-01, ..., data-2020-01.
Problem
Free storage space is very unbalanced across instances, which I can see in the AWS console:
I have noticed this distribution changes every time the AWS services runs through a blue-green deploy.
This happens when cluster settings are changed or AWS releases an update.
Sometimes the blue-green results in one of the instances completely running out of space.
When this happens the AWS service starts another blue-green and this resolves the issue without customer impact. (It does have impact on my heart rate though!)
Shard Size
Shards size for our indices are gigabytes in size but below the Elasticsearch recommendation of 50GB.
The shard size does vary by index, though. Lots of our older indices have only a handful of documents.
Question
The way the AWS balancing algorithm does not balance well, and that it results in a different result each time is unexpected.
My question is how does the algorithm choose which shards to allocate to which instance and can I resolve this imbalance myself?
I asked this question of AWS support who were able to give me a good answer so I thought I'd share the summary here for others.
In short:
AWS Elasticsearch distributes shards based on shard count rather than shard size so keep your shard sizes balanced if you can.
If you have your cluster configured to be spread across 3 availability zones, make your data instance count a divisible by 3.
My Case
Each of my 14 instances gets ~100 shards instead of ~100 GB each.
Remember that I have a lot of relatively empty indices.
This translates to a mixture of small and large shards which causes the imbalance when AWS Elasticsearch (inadvertently) allocates lots of large shards to an instance.
This is further worsened by the fact that I have my cluster set to be distributed across 3 availability zones and my data instance count (14) is not divisible by 3.
Increasing my data instance count to 15 (or decreasing to 12) solved the problem.
From the AWS Elasticsearch docs on Multi-AZ:
To avoid these kinds of situations, which can strain individual nodes and hurt performance, we recommend that you choose an instance count that is a multiple of three if you plan to have two or more replicas per index.
Further Improvement
On top of the availability zone issue, I suggest keeping index sizes balanced to make it easier for the AWS algorithm.
In my case I can merge older indexes, e.g. data-2019-01 ... data-2019-12 -> data-2019.
Related
I have a regional application which is made up of several duplicate zonal applications for HA. Each zonal application is made up of many instances, and I want each individual zonal application to be as high availability as possible. Ideally, no more than a certain percent of VMs should fail at once in a given AZ (otherwise we would treat it as a total zonal failure and fail away from that AZ).
I want to use partition placement groups with as close as possible to an equal number of instances in each partition to improve our chances of meeting the “no more than X% simultaneous failures” goal. I also want to use autoscaling groups to scale the zonal application in and out according to demand.
Is there a way to configure autoscaling groups to achieve my goal of roughly-equal partition sizes within a zone? I considered configuring one ASG per partition and relying on my load balancing to spread load equally across them, but if they get unbalanced for some reason I think they would stay unbalanced.
I have been trying to auto-scale a 3node Cassandra cluster with Replication Factor 3 and Consistency Level 1 on Amazon EC2 instances.
What steps do I need to perform to add/remove nodes to the cluser dynamically based on load on application?
Unfortunately scaling up and down responding to the current load is not straightforward, and if you have a cluster with a large amount of data, this won't be possible:
you can't add multiple nodes simultaneously to a cluster, all the
operations need to be sequential.
adding or removing a node will require to stream data in or out of
the node; this will depend on the size of your data, as well as the
EC2 instance type you are using (for the network bandwidth limit);
also, there will be differences if you are utilizing instance
storage or EBS (EBS will limit you in IOPS)
You mentioned that you are using AWS and a replication factor of 3,
are you also using different availability zones (AZ's)? if you are,
the EC2Snitch will work to ensure that the information is balanced
between them, in order to be resilient, when you are scaling up and
down you will need to keep an even distribution between AZ's.
The scale operations will cause a rearrangement in the distribution
of tokens, once that it is completed you will need to do a cleanup
(nodetool cleanup) to remove data that is not in use anymore by
the node; this operation will also take time. This is important to
keep in mind if you are scaling up because you are running out
space.
For our use case, we are getting good results taking a proactive approach, we have set up an aggressive alert/monitoring strategy to have an early detection, so we can start the scale up operations before there is any performance impact. If your application or use case has a predictable pattern of usage can also help you to take actions in preparation of periods of high workloads.
I have an EC2 and RDS in the same region US East(N. Virginia) but both resources are in different zones; RDS in us-east-1a and EC2 in us-east-1b.
Now the question is that if I put both resources within the same zone then would it speed up the data transfer to/from DB? I receive daily around 20k-30k entries from app to this instance.
EDIT
I read here that:
Each Availability Zone is isolated, but the Availability Zones in a region are connected through low-latency links.
Now I am wondering if these low-latency links are very minor or should I consider shifting my resources in the same zone to speed up the data transfer?
Conclusion
As discussed in answers and comments:
Since I have only one instance of EC2 and RDS, failure of one service in a zone will affect the whole system. So there is no advantage to keeping them in a separate zone.
Even though zones are connected together with low-latency links but there is still some latency which is neglectable in my case.
There is also a minor data transfer charge of USD 0.01/GB between EC2 and RDS in different zones.
What are typical values for Interzone data transfers in the same region?
Although AWS will not guarantee, state, or otherwise commit to hard numbers, typical measurments are sub 10 ms, with numbers around 3 ms is what I have seen.
How does latency affect data transfer throughput?
The higher the latency the lower the maximum bandwidth. There are a number of factors to consider here. An excellent paper was written by Brad Hedlund.
Should I worry about latency in AWS networks between zones in the same region?
Unless you are using the latest instances with very high performance network adapters (10 Gb or higher) I would not worry about it. The benefits of fault tolerance should take precendence except for the most specialized cases.
For your use case, database transactions, the difference between 1 ms and 10 ms will have minimal impact, if at all, on your transaction performance.
However, unless you are using multiple EC2 instances in multiple zones, you want your single EC2 instance in the same zone as RDS. If you are in two zones, the failure of either zone brings down your configuration.
There are times where latency and network bandwidth are very important. For this specialized case, AWS offers placement groups so that the EC2 instances are basically in the same rack close together to minimize latency to the absolute minimum.
Moving the resources to the same AZ would decrease latency by very little. See here for some unofficial benchmarks. For your use-case of 20k reads/writes per day, this will NOT make a huge difference.
However, moving resources to the same AZ would significantly increase reliability in your case. If you only have 1 DB and 1 Compute Instance that depend on each other, then there is no reason to put them in separate availability zones. With your current architecture, a failure in either us-east-1a or us-east-1b would bring down your project. Unless you plan on scaling out your project to have multiple DBs and Compute Instances, they should both reside in the same AZ.
According to some tests, i can see like 600 microseconds (0.6 ms) latency between availability zones, inside the same region. A fiber has 5 microseconds delay (latency) per km, and between azs there is less than 100km, hence the result matches.
I have created few AWS EC2 instances, however, sometimes, my data throughput (both for upload and download) are becoming highly limited on certain servers.
For example, typically I have about 15-17 MB/s throughput from instance located in US West (Oregon) server. However, sometimes, especially when I transfer a large amount of data in a single day, my throughput drops to 1-2 MB/s. When it happens on one server, the other servers have a typical network throughput (as previously expect).
How can I avoid it? And what can cause this?
If it is due to amount of my data upload/download, how can I avoid it?
At the moment, I am using t2.micro type instances.
Simple answer, don't use micro instances.
AWS is a multi-tenant environment as such resource are shared. When it comes to network performance, the larger instance sizes get higher priority. Only the largest instances get any sort of dedicated performance.
Micro and nano instances get the lowest priority out of all instances types.
This matrix will show you what priority each instance size gets:
https://aws.amazon.com/ec2/instance-types/#instance-type-matrix
can i add aerospike cluster under aws autoscale? Like . my initial autoscale group size will 3, if more traffic comes in and if cpu utilization is greater then 80% then it will add another instance into the cluster. do you think it is possible? and does it has any disadvantage or will create any problem in cluster?
There's an Amazon CloudFormation script at aerospike/aws-cloudformation that gives an example of how to launch such a cluster.
However, the point of autoscale is to grow shared-nothing worker nodes, such as webapps. These nodes typically don't have any shared data on them, you simply launch a new one and it's ready to work.
The point of adding a node to a distributed database like Aerospike is to have more data capacity, and to even out the data across more nodes, which gives you an increased ability to handle operations (reads, writes, etc). Autoscaling Aerospike would probably not work as you expect it. This is because of the fact that when a node is added to the cluster a new (larger) cluster is formed, and the data is automatically balanced. Part of balancing is migrating partitions of data between nodes, and it ends when the number of partitions across each node is even once again (and therefore the data is evenly spread across all the nodes of the cluster). Migrations are heavy, taking up network bandwidth.
This would work if you could time it to happen ahead of the traffic peaking, because then migrations could be completed ahead of time, and your cluster would be ready for the next peak. You would not want to do this as peak traffic is occuring, because it would only make things worse. You also want to make sure that when the cluster contracts there is enough room for the data, enough DRAM for the primary-index, as the per-node usage of both will grow.
One more point of having extra capacity in Aerospike is to allow for rolling upgrades, where one node goes through upgrade at a time without needing to take down the entire cluster. Aerospike is typically used for realtime applications that require no downtime. At a minimum your cluster needs to be able to handle a node going down and have enough capacity to pick up the slack.
Just as a note, you have fine grain configuration control over the rate in which migrations happen, but they run longer if you make the process less aggressive.