add node to existing aerospike cluster using autoscale - amazon-web-services

can i add aerospike cluster under aws autoscale? Like . my initial autoscale group size will 3, if more traffic comes in and if cpu utilization is greater then 80% then it will add another instance into the cluster. do you think it is possible? and does it has any disadvantage or will create any problem in cluster?

There's an Amazon CloudFormation script at aerospike/aws-cloudformation that gives an example of how to launch such a cluster.
However, the point of autoscale is to grow shared-nothing worker nodes, such as webapps. These nodes typically don't have any shared data on them, you simply launch a new one and it's ready to work.
The point of adding a node to a distributed database like Aerospike is to have more data capacity, and to even out the data across more nodes, which gives you an increased ability to handle operations (reads, writes, etc). Autoscaling Aerospike would probably not work as you expect it. This is because of the fact that when a node is added to the cluster a new (larger) cluster is formed, and the data is automatically balanced. Part of balancing is migrating partitions of data between nodes, and it ends when the number of partitions across each node is even once again (and therefore the data is evenly spread across all the nodes of the cluster). Migrations are heavy, taking up network bandwidth.
This would work if you could time it to happen ahead of the traffic peaking, because then migrations could be completed ahead of time, and your cluster would be ready for the next peak. You would not want to do this as peak traffic is occuring, because it would only make things worse. You also want to make sure that when the cluster contracts there is enough room for the data, enough DRAM for the primary-index, as the per-node usage of both will grow.
One more point of having extra capacity in Aerospike is to allow for rolling upgrades, where one node goes through upgrade at a time without needing to take down the entire cluster. Aerospike is typically used for realtime applications that require no downtime. At a minimum your cluster needs to be able to handle a node going down and have enough capacity to pick up the slack.
Just as a note, you have fine grain configuration control over the rate in which migrations happen, but they run longer if you make the process less aggressive.

Related

In GKE, how to minimize connect time with Load balancer

In GKE, for cost-saving, I usually put the node number to zero. When I autoscale nodes(or say add) and run the pods. It takes more than 6-7 mins to connect to Loadbalancer and up the URL. That's why health checks in the waiting state. Is there any way to reduce the time? Thanks
If Cloud Functions is not an option, you might want to look at Cloud Run (which supports containers and scales to zero) or GKE Autopilot (which does not scale to zero, but you can scale down to low resource and it will autoscale up and down as needed)
In short not really. Spinning up time of nodes is not easily controlled, basically it is the time that will take for the VM to be allocated, turned on, boot the OS and do some other stuff related to Kubernetes (like configuration, adding to node pool, etc) this takes time! In addition to Pods spinning up time which depends on the Docker image (size/dependencies etc).
Scaling down your application to zero nodes is not very recommended. It is always recommended to have some nodes up (don’t you have other apps running on the GKE cluster? Kubernetes clusters are recommended to have at least 3 nodes running).
Have you considered using Cloud Functions? Is it possible in your case? This the the best option I know of for a quick scale up and zero scale down.
And in general you can keep some kind of “ping” to the function to keep it “hot” for a relatively cheap price.
If none of the options above is possible (id say keeping your node pool with at least 3 nodes operating, is best as it is takes time for the Kubernetes control plan to boot). I suggest starting with reducing the spinning up time of your Pods by improving the Docker image - reducing its size etc.
Here are some articles on how to reduce Docker image size
https://phoenixnap.com/kb/docker-image-size
https://www.ardanlabs.com/blog/2020/02/docker-images-part1-reducing-image-size.html
After that I will experiment with different machine types for node to check which one is spinning the fastest - could be an interesting thing to do in any case
Here is an interesting comparison on VM spinning up times
https://www.google.com/amp/s/blog.cloud66.com/part-2-comparing-the-speed-of-vm-creation-and-ssh-access-on-aws-digitalocean-linode-vexxhost-google-cloud-rackspace-packet-cloud-a-and-microsoft-azure/amp/

Problems with Memory and CPU limits in AWS ECS cluster running on reserved EC2 instance

I am running the ECS cluster that currently has 3 services running on T3 medium instance. Each of those services is running only one task which has a soft memory limit of 1GB, the hard limit is different for each (but that should not be the problem). I will always have enough memory to run one, new deployed task (new one will also take 1GB, and T3 medium will be able to handle it since it has 4GB total). After the new task is up and running, the old one will be stopped and I will have again 1GB free for the new deployment. I did similar to the CPU (2048 CPU, each task has 512, and 512 free for new deployments).
So everything runs fine now, but I am not completely satisfied with this setup for the future. What will happen if I need to add another service with another task? I need to deploy all existing tasks and to modify their task definitions to use less CPU and memory in order to run this new task (and new deployments). I am planning to get a reserved EC2 instance, so it will not be easy to swap the current EC2 instance with the larger one.
Is there a way to spin up another EC2 instance for the same ECS cluster to handle bursts in my tasks? Also deployments, it's not a perfect scenario to have the ability to deploy only one task, and then wait for old to be killed in order to deploy the next one, without downtimes.
And biggest concern, what if I need new service and task, I need again to adjust all others in order to run a new one and deploy others, which is not very maintainable and what if I cannot lower CPU and memory more because I already reached the lowest point in order to run the task smoothly.
I was thinking about having another EC2 instance for the same cluster, that will handle bursts, deployments, and new services/tasks. But not sure if that's possible and if that's the best way of doing this. I was also thinking about Fargate, but this is much more expensive and I cannot afford it for now. What do you think? Any ideas, suggestions, and hints will be helpful since I am desperate to find the best way to avoid the problems mentioned above.
Thanks in advance!
So unfortunately, there is no out of the box solution to ensure that all your tasks run on min possible (i.e. one) instance. You can use our new feature called Capacity Providers (CP), which will allow you to ensure the minimum number of ec2 instances required to run all your tasks. The major difference between CP vs ASG is that CP gives more weight to task placement (where as ASG will scale in/out based on resource utilization which isn't ideal in your case).
However, it's not an ideal solution. Just as you said in your comment, when the service needs to scale out during a deployment, CP will spin up another instance, the new task will be placed on it and once it gets to Running state, the old task will be stopped.
But now you have an "extra" EC2 instance because there is no way to replace a running task. The only way I can think of would be to use a lambda function that drains the new instance, which will move all the service tasks to the other instance. CP will, after about 15 minutes, terminate this instance as there are no tasks are running on it.
A couple caveats:
CP are new, a little rough around the edges, and you can't
delete/modify them. You can only create or deactivate them.
CP needs an underlying ASG and they must have a 1-1 relationship
Make sure to enable managed scaling when creating CP
Choose 100% capacity target
Don't forget to add a default capacity strategy for the cluster
Minimizing EC2 instances used:
If you're using a capacity provider, the 'binpack' placement strategy minimises the number of EC2 hosts that are used.
However, there are some scale-in scenarios where you can end up with a single task running on its own EC2 instance. As Ali mentions in their answer; ECS will not replace this running task, but depending on your setup, it may be fairly easy for you to replace it yourself by configuring your task to voluntarily 'quit'.
In my case; I always have at least 2 tasks running per service. So I just added some logic to my tasks' healthchecks, so they report as unhealthy after ~6 hours. ECS will spot the 'unhealthy' task, remove it from the load balancer, and spin up a replacement (according to the binpack strategy).
Note: If you take this approach; add some variation to your timeout so you're less likely to have all of your tasks expire at the same time. Something like: expiry = now + timedelta(hours=random.uniform(5.5,6.5))
Sharing memory 'headspace' with soft-limits:
If you set both soft and hard memory limits; ECS will place your tasks based on the soft limit. If your tasks' memory usage varies with usage, it's fairly easy to get your EC2 instance to start swapping.
For example: Say you have a task defined with a soft limit of 900mb, and a hard limit of 1800mb. You spin up a service with 4 running instances. ECS provisions all 4 of these instances on a single t3.medium. Notice here that each instance thinks it can safely use up to 1800mb, when in fact there's very little free memory on the host server. When you hit your service with some traffic; each task tries to use some more memory, and your t3.medium is incapacitated as it starts swapping memory to disk. ECS does not recover from this type of failure very well. It notices that the task instances are no longer available, and will attempt to provision replacements, but the capacity provider is very slow to replace the swapping t3.medium.
My suggestion:
Configure your service to auto-scale based on memory usage (this will be a percentage of your soft-limit), for example: a target memory usage of 70%
Configure your tasks' healthchecks so that they report as unhealthy when they are nearing their soft-limit. This way, your tasks still have some headroom for quick spikes of memory usage, while giving your load balancer a chance to drain and gracefully replace tasks that are getting greedy. This is fairly easy to do by reading the value within /sys/fs/cgroup/memory/memory.usage_in_bytes.

How to autoscale Cassandra cluster on AWS or how to node dynamically to the cluster depending upon the load

I have been trying to auto-scale a 3node Cassandra cluster with Replication Factor 3 and Consistency Level 1 on Amazon EC2 instances.
What steps do I need to perform to add/remove nodes to the cluser dynamically based on load on application?
Unfortunately scaling up and down responding to the current load is not straightforward, and if you have a cluster with a large amount of data, this won't be possible:
you can't add multiple nodes simultaneously to a cluster, all the
operations need to be sequential.
adding or removing a node will require to stream data in or out of
the node; this will depend on the size of your data, as well as the
EC2 instance type you are using (for the network bandwidth limit);
also, there will be differences if you are utilizing instance
storage or EBS (EBS will limit you in IOPS)
You mentioned that you are using AWS and a replication factor of 3,
are you also using different availability zones (AZ's)? if you are,
the EC2Snitch will work to ensure that the information is balanced
between them, in order to be resilient, when you are scaling up and
down you will need to keep an even distribution between AZ's.
The scale operations will cause a rearrangement in the distribution
of tokens, once that it is completed you will need to do a cleanup
(nodetool cleanup) to remove data that is not in use anymore by
the node; this operation will also take time. This is important to
keep in mind if you are scaling up because you are running out
space.
For our use case, we are getting good results taking a proactive approach, we have set up an aggressive alert/monitoring strategy to have an early detection, so we can start the scale up operations before there is any performance impact. If your application or use case has a predictable pattern of usage can also help you to take actions in preparation of periods of high workloads.

Does AWS take down each availability zones(A-Z) or whole regions for maintenance

AWS has a maintenance window for each region.
https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/maintenance-window.html but could not find any documentation about how it works with multiple A-Z in the same region.
I have a Redis cache configured and have a replica on different(A-Z) in the same region. The whole purpose of configuring replica on different(A-Z) if one (A-Z) is not available serve it from next(A-Z)
When they doing maintenance are they take down the whole region or individual availability zone
You should read the FAQ on ElastiCache maintenance https://aws.amazon.com/elasticache/elasticache-maintenance/
This says that if you have a multi AZ deployment, it will take down the instances one at a time triggering a fail-over to the read replica, and then create new instances before taking down the rest so you should not experience any interruptions in your service.
Thanks #morras for the above link and explains how elasticache works maintenance window period. Below 3 question I have taken out from the above link and explain about it.
1. How long does a node replacement take?
A replacement typically completes within a few minutes. The replacement may take longer in certain instance configurations and traffic patterns. For example, Redis primary nodes may not have enough free memory, and may be experiencing high write traffic. When an empty replica syncs from this primary, the primary node may run out of memory trying to address the incoming writes as well as sync the replica. In that case, the master disconnects the replica and restarts the sync process. It may take multiple attempts for replica to sync successfully. It is also possible that replica may never sync if the incoming write traffic continues to remains high.
Memcached nodes do not need to sync during replacement and are always replaced fast irrespective of node sizes.
2. How does a node replacement impact my application?
For Redis nodes, the replacement process is designed to make a best effort to retain your existing data and requires successful Redis replication. For single node Redis clusters, ElastiCache dynamically spins up a replica, replicates the data, and then fails over to it. For replication groups consisting of multiple nodes, ElastiCache replaces the existing replicas and syncs data from the primary to the new replicas. If Multi-AZ or Cluster Mode is enabled, replacing the primary triggers a failover to a read replica. If Multi-AZ is disabled, ElastiCache replaces the primary and then syncs the data from a read replica. The primary will be unavailable during this time.
For Memcached nodes, the replacement process brings up an empty new node and terminates the current node. The new node will be unavailable for a short period during the switch. Once switched, your application may see performance degradation while the empty new node is populated with cache data.
3. What best practices should I follow for a smooth replacement experience and minimize data loss?
For Redis nodes, the replacement process is designed to make a best effort to retain your existing data and requires successful Redis replication. We try to replace just enough nodes from the same cluster at a time to keep the cluster stable. You can provision primary and read replicas in different availability zones. In this case, when a node is replaced, the data will be synced from a peer node in a different availability zone. For single node Redis clusters, we recommend that sufficient memory is available to Redis, as described here. For Redis replication groups with multiple nodes, we also recommend scheduling the replacement during a period with low incoming write traffic.
For Memcached nodes, schedule your maintenance window during a period with low incoming write traffic, test your application for failover and use the ElastiCache provided "smarter" client. You cannot avoid data loss as Memcached has data purely in memory.

Emr 4.2 Spot Prices on Core Nodes

Now that EMR supports downsizing of Core nodes on EMR, if I create an EMR cluster with 1 of the core nodes as a spot instance. What happens when the spot price exceeds the bid price for my core node? Will it gracefully decomission that core node?
Here is Amazon's description of the process of shrinking the number of core nodes:
On core nodes, both YARN NodeManager and HDFS DataNode daemons must be
decommissioned in order for the instance group to shrink. For YARN,
graceful shrink ensures that a node marked for decommissioning is only
transitioned to the DECOMMISIONED state if there are no pending or
incomplete containers or applications. The decommissioning finishes
immediately if there are no running containers on the node at the
beginning of decommissioning.
For HDFS, graceful shrink ensures that the target capacity of HDFS is
large enough to fit all existing blocks. If the target capacity is not
large enough, only a partial amount of core instances are
decommissioned such that the remaining nodes can handle the current
data residing in HDFS. You should ensure additional HDFS capacity to
allow further decommissioning. You should also try to minimize write
I/O before attempting to shrink instance groups as that may delay the
completion of the resize operation.
Another limit is the default replication factor, dfs.replication
inside /etc/hadoop/conf/hdfs-site. Amazon EMR configures the value
based on the number of instances in the cluster: 1 with 1-3 instances,
2 for clusters with 4-9 instances, and 3 for clusters with 10+
instances. Graceful shrink does not allow you to shrink core nodes
below the HDFS replication factor; this is to prevent HDFS from being
unable to close files due insufficient replicas. To circumvent this
limit, you must lower the replication factor and restart the NameNode
daemon.
I think it might not be possible to gracefully decommission the node in case of Spot price spike (general case with N core nodes). There is a 2 minute notification possible before the Spot Instance is removed due to price spike. Even if captured, this time period might not be sufficient to guarantee decommission of HDFS data.
Also, with only 1 core node in cluster, decommissioning does not make much sense. The data held in the cluster needs to be moved to other nodes, which are not available in this case. Once the only available core node is lost, there needs to be a way to bring one back, else the cluster cannot run any tasks.
Shameless plug :) : I work for Qubole!
The following 2 blog posts might be useful around integration of Spot instances with Hadoop clusters including dealing with Spot price spikes.
https://www.qubole.com/blog/product/riding-the-spotted-elephant
https://www.qubole.com/blog/product/rebalancing-hadoop-higher-spot-utilization