How to reserve a node in GKE Autopilot - google-cloud-platform

Is it possible to keep always one additional node in GKE autopilot reserved besides of the currently used nodes to help reduce scaling time?
For example if we currently have 5 nodes and there is a spike in the application traffic, it often happens that the current 5 nodes don't have any more free resources to handle our application and there is a ~2 minute wait time until a new node is allocated.
We would like to always keep a free node allocated, so when there is a need for scale, we can deploy the application fast (because for a Node.js application with a 5 second start time, the extra 2 minutes wait time is a bit annoying).
Is it currently possible to keep an empty node allocated for such situations?

Yes it is possible, but you will be paying extra for allocated resources.
The idea here is to create a separate Deployment for 'placeholder' POD with lowest possible priority and resources that you need allocate upfront (CPU/Memory - it should be at least the size of your largest POD). You can add more Replicas to allocate more resources in order to have at least one node always on standby.
In case of sudden spike in application traffic, this POD will be preempted by deployment with higher priority and another node will be created if needed for 'placeholder' POD.
Please read this great artictle with .yaml snippets

Related

Cloud Run, ideal vCPU and memory amount per instance?

When setting up a cloud run, I am worried about how many memory and vCPU should be set each time per server instance.
I use Cloud Run for mobile apps.
I am confused about when to increase vCPU and memory instead of increasing server instances, and when to increase server instances instead of vCPU and memory.
How should I calculate it?
There isn't a good answer to that question. You have to know the limits:
The max number of concurrent requests that you can handle concurrently with 4cpu or/and 32Gb of memory (up to 1000 concurrent requests)
The max number on instance on Cloud Run (1000)
Then it's a matter of tradeoff, and it's highly dependent of your use case.
Bigger instances reduce the number of cold starts (and so high latency when your service scale up). But, if you have only 1 request at a time, you will pay a BIG instance for a small processing
Smaller instances allow you to optimize cost and to add only a small slice of resource in your cluster, but you will have to spawn often a new instance and you will have several cold start to endure.
Optimize what you prefer, find the right balance. No magic formula!!
You can simulate a load of requests in your current settings using k6.io, check the memory and cpu percentage of your container and adjust them to a lower or higher setting to see if you can get more RPS out of a single container.
Once you are satisfied with a single container instance's let's say 100 rps per container instance, you can then specify using gcloud the flags --min-instances and --max-instances depending of course on the --concurrency flag, which in my explanation would be set to 100.
Also note that it starts at the default of 80 and can go up to 1000.
More info about this can be read on the links below:
https://cloud.google.com/run/docs/about-concurrency
https://cloud.google.com/sdk/gcloud/reference/run/deploy
I would also recommend you investigating if you need to pass the --cpu-throttling flag or the --no-cpu-throttling depending on your need for adjusting for cold starts.

'Kubelet stopped posting node status' and node inaccessible

I am having some issues with a fairly new cluster where a couple of nodes (always seems to happen in pairs but potentially just a coincidence) will become NotReady and a kubectl describe will say that the Kubelet stopped posting node status for memory, disk, PID and ready.
All of the running pods are stuck in Terminating (can use k9s to connect to the cluster and see this) and the only solution I have found is to cordon and drain the nodes. After a few hours they seem to be being deleted and new ones created. Alternatively I can delete them using kubectl.
They are completely inaccessible via ssh (timeout) but AWS reports the EC2 instances as having no issues.
This has now happened three times in the past week. Everything does recover fine but there is clearly some issue and I would like to get to the bottom of it.
How would I go about finding out what has gone on if I cannot get onto the boxes at all? (Actually just occurred to me to maybe take a snapshot of the volume and mount it so will try that if it happens again, but any other suggestions welcome)
Running kubernetes v1.18.8
There are two most common possibilities here, both most likely caused by a large load:
Out of Memory error on the kubelet host. Can be solved by adding proper --kubelet-extra-args to BootstrapArguments. For example: --kubelet-extra-args "--kube-reserved memory=0.3Gi,ephemeral-storage=1Gi --system-reserved memory=0.2Gi,ephemeral-storage=1Gi --eviction-hard memory.available<200Mi,nodefs.available<10%"
An issue explained here:
kubelet cannot patch its node status sometimes, ’cos more than 250
resources stay on the node, kubelet cannot watch more than 250 streams
with kube-apiserver at the same time. So, I just adjust kube-apiserver
--http2-max-streams-per-connection to 1000 to relieve the pain.
You can either adjust the values provided above or try to find the cause of high load/iops and try to tune it down.
I had the same issue, after 20-30 min my nodes became in NotRready status, and all pods linked to these nodes became stuck in Terminating status.I tried to connect to my nodes via SSH, sometimes I faced a timeout, sometimes I could (hardly) connect, and I executed the top command to check the running processes.The most consuming process was kswapd0.My instance memory and CPU were both full (!), because it tried to swap a lot (due to a lack of memory), causing the kswapd0 process to consume more than 50% of the CPU!Root cause:Some pods consumed 400% of their memory request (defined in Kubernetes deployment), because they were initially under-provisioned. As a consequence, when my nodes started, Kubernetes placed them on nodes with only 32Mb of memory request per pod (the value I had defined), but that was insufficient.Solution:The solution was to increase containers requests:
requests:
memory: "32Mi"
cpu: "20m"
limits:
memory: "256Mi"
cpu: "100m"
With these values (in my case):
requests:
memory: "256Mi"
cpu: "20m"
limits:
memory: "512Mi"
cpu: "200m"
Important:
After that I processed a rolling update (cordon > drain > delete) of my nodes in order to ensure that Kubernetes reserve directly enough memory for my freshly started pods.
Conclusion:
Regularly check your pods' memory consumption, and adjust your resources requests over time.
The goal is to never leave your nodes be surprised by a memory saturation, because the swap can be fatal for your nodes.
The answer turned out to be an issue with iops as a result of du commands coming from - I think - cadvisor. I have moved to io1 boxes and have had stability since then so going to mark this as closed and the move of ec2 instance types as the resolution
Thanks for the help!

Faster killing of ECS containers based on inactive tasks?

I've been lurking for years and the time has finally come to post my first question!
So, my GitLab/Terraform/AWS pipeline pushes containers to Fargate. Once the task definition gets updated, new containers go live and pass health checks. At this point both the old and the new containers are up:
It takes several minutes until the auto-scaler shuts down the old containers. This is in a dev environment so nobody is accessing anything and there are no connections to drain. Other than manually, is there a way to make this faster or even instant?
Thanks in advance!
There is a way to reduce the time you have to wait for tasks to drain. Go to EC2 -> Target Groups -> (Select your target group) -> Description and scroll down. At the bottom is a property called "Deregistration Delay". This is the amount of time the target group will allow connections to drain before shutting down a container (I think it defaults to 5 minutes). Just reduce that value and you should be able to deploy much quicker. Hope this helps!

Elasticsearch percolation dead slow on AWS EC2

Recently we switched our cluster to EC2 and everything is working great... except percolation :(
We use Elasticsearch 2.2.0.
To reindex (and percolate) our data we use a separate EC2 c3.8xlarge instance (32 cores, 60GB, 2 x 160 GB SSD) and tell our index to include only this node in allocation.
Because we'll distribute it amongst the rest of the nodes later, we use 10 shards, no replicas (just for indexing and percolation).
There are about 22 million documents in the index and 15.000 percolators. The index is a tad smaller than 11GB (and so easily fits into memory).
About 16 php processes talk to the REST API doing multi percolate requests with 200 requests in each (we made it smaller because of the performance, it was 1000 per request before).
One percolation request (a real one, tapped off of the php processes running) is taking around 2m20s under load (of the 16 php processes). That would've been ok if one of the resources on the EC2 was maxed out but that's the strange thing (see stats output here but also seen on htop, iotop and iostat): load, cpu, memory, heap, io; everything is well (very well) within limits. There doesn't seem to be a shortage of resources but still, percolation performance is bad.
When we back off the php processes and try the percolate request again, it comes out at around 15s. Just to be clear: I don't have a problem with a 2min+ multi percolate request. As long as I know that one of the resources is fully utilized (and I can act upon it by giving it more of what it wants).
So, ok, it's not the usual suspects, let's try different stuff:
To rule out network, coordination, etc issues we also did the same request from the node itself (enabling the client) with the same pressure from the php processes: no change
We upped the processors configuration in elasticsearch.yml and restarted the node to fake our way to a higher usage of resources: no change.
We tried tweaking the percolate and get pool size and queue size: no change.
When we looked at the hot threads, we DiscovereUsageTrackingQueryCachingPolicy was coming up a lot so we did as suggested in this issue: no change.
Maybe it's the amount of replicas, seeing Elasticsearch uses those to do searches as well? We upped it to 3 and used more EC2 to spread them out: no change.
To determine if we could actually use all resources on EC2, we did stress tests and everything seemed fine, getting it to loads of over 40. Also IO, memory, etc showed no issues under high strain.
It could still be the batch size. Under load we tried a batch of just one percolator in a multi percolate request, directly on the data & client node (dedicated to this index) and found that it used 1m50s. When we tried a batch of 200 percolators (still in one multi percolate request) it used 2m02s (which fits roughly with the 15s result of earlier, without pressure).
This last point might be interesting! It seems that it's stuck somewhere for a loooong time and then goes through the percolate phase quite smoothly.
Can anyone make anything out of this? Anything we have missed? We can provide more data if needed.
Have a look at the thread on the Elastic Discuss forum to see the solution.
TLDR;
Use multiple nodes on one big server to get better resource utilization.

Design help: Akka clustering and dynamically created Actors

I have N nodes (i.e. distinct JREs) in my infrastructure running Akka (not clustered yet)
Nodes have no particular "role", but they are just processors of data. The "processors" of this data will be Actors. All sorts of non-Akka/Actor (other java code) (callers) can invoke specific types of processors by creating messages them data to work on. Eventually they need the result back.
A "processor" Actor is pretty simply and supports a method like "process(data)", they are stateless, they mutate and send data to an external system. These processors can vary in execution time so they are a good fit for wrapping up in an Actor.
There are numerous different types of these "processors" and the configuration for each unique one is stored in a database. Each node in my system, when it starts up, needs to create a router Actor that fronts N instances of each of these unique processor Actor types. I cannnot statically define/name/create these Actors hardwired in code, or akka configuration.
It is important to note that the configuration for any Actor processor can be changed in the database at anytime and periodically the creator of the routers for these Actors needs to terminate and recreate them dynamically based on the new configuration.
A key point is that some of these "processors" can only have a very limited # of Actor instances across all of my nodes. I.E processorType-A can have an unlimited number of instances, while processorType-B can only have 2 instances running across the entire cluster. Hence callers on NODE1 who want to invoke processorType-B would need to have their message routed to NODE2, because that node is the only node running processorType-B actor instances.
With that context in mind here is my question that I'm looking for some design help with:
For points 1, 2, 3, 4 above, I have a good understanding of and implementation for
For points 5 and 6 however I am not sure how to properly implement this with Akka clustering given that my "nodes" are not aware of each other AND they each run the same code to dynamically create these router actors based on that database configuration)
Issues that come to mind are:
How do I properly deal with the "names" of these router Actors across the cluster? I.E for "processorType-A", which can have an unlimited number of Actor instances. Each node would locally have these instances available, yet if they are all terminated on a single node, I would still want messages for their "processor type" to be routed on to another node that still has viable instances available.
How do I deal with enforcing/coordinating the "processor" instance limitation across the cluster (i.e. "processorType-B" can only have 2 instances globally) etc. While processorType-A can have a much higher number. Its like nodes need to have some way to check with each other as to who has created these instances across the cluster? I'm not sure if Akka has a facility to do this on its own?
ClusterRouterPool? w/ ClusterRouterPoolSettings?
Any thoughts and/or design tip/ideas are much appreciated! Thanks