SurgeQueueLength equivalent for Application Load Balancers - amazon-web-services

I'm looking to set up Auto-Scaling for a service running on AWS ECS. From the ECS Auto-Scaling docs it suggests to use SurgeQueueLength to determine whether to trigger an autoscale event. We use an Application Load Balancer which does not have this metric, looking through the table of metrics nothing seems equivalent. Am I missing something or is this just a missing feature in ALBs at present?

Disclaimer: I don't have experience with Application Load Balancers. I'm just deriving these facts from AWS docs. For a more hands on read you might read the ALB section of this medium post
You are correct, in the CloudWatch metrics for Application Load Balancers there is no SurgeQueueLength. This is also confirmed in this thread by an AWS employee, however, these metrics could be used as a CloudWatch metric to trigger auto scaling:
TargetConnectionErrorCount: IMO this is corresponding best to the SurgeQueueLength as it indicates that the Loadbalancer tried to open a connection to a backend node and failed
HTTPCode_ELB_5XX_Count: depending on the backend nodes this might be an indicator that they refuse new connections because e.g. their max connection count is reached
RejectedConnectionCount: this is what the AWS employee suggested in the treadh linked above. Buuut.. the doc says "number of connections that were rejected because the load balancer had reached its maximum number of connections" this seems more like a limit on aws side which you cannot really influence (i.e. it is not described in the limits on ALBs)
RequestCountPerTarget: that's the average number of connections a backend node gets per minute. When you track that over a period of time you might be able to evaluate a "healthy threshold"
TargetResponseTime: number of seconds a backend node needs to answer a request. Another candidate for evaluating as "healthy threshold" (i.e. "what's the maximum response time you want the end user to experience?")
Overall it seems that there is no "clear correct answer" to your question and the answer is a "it depends on your situation".
The question which suggests itself is: "why are there no queue metrics such as SurgeQueueLength". This is nowhere answered in the docs. I guess this is either because ALBs are designed differently than ELBs or it is a metric which is just not exposed yet.

ALBs are designed differently and don't have SurgeQueueLength or SpillOver metrics. Source: AWS Staff.

Related

ECS Tags not reported correctly in cost explorer

I have some infrastructure running on AWS for a client. Because I am expecting more clients to request me handling (part of) their infrastructure in the future, I have set up tagging on all of the resources provisioned for that client. These tags not only allow me to differentiate quickly between the resources, but also provide me with a clear report in Cost Explorer. There, I just group by Service and filter on the "client" tag with a value corresponding to that client's reference code.
However, I've run into a small problem. As far as I can tell, I've set up all my tags correctly. However, about 80% worth of the cost grouped under "Elastic Container Service" keeps being reported under as "No TagKey: client".
When I start looking in the ECS dashboard, everything seems to be tagged correctly though. The service, the tasks, everything is correctly tagged. Through Terraform, I've enabled the properties on the ecs service which I figured would be needed for these tags to correctly propagate:
enable_ecs_managed_tags = true
propagate_tags = "SERVICE"
As far as I can tell, the tags are there for the service, the tasks, the underlying ec2 instances, the load balancer, the cluster itself, the subnet... What am I missing? Remember, Cost Explorer is reporting that 80% of the total cost of "Elastic Container Service" is not tagged, so I must be missing something pretty significant here.

Is it possible to measure HTTP response latencies without changing my server code?

I have a small number of HTTP servers on GCP VMs. I have a mixture of different server languages and Linux based OS's.
Questions
A. It it possible to use the Stackdriver monitoring service to set alerts at specific percentiles for HTTP response latencies?
B. Can I do this without editing the code of each server process?
C. Will installing the agent into the VM report HTTP latencies?
For example, if the 95th percentile goes over 100ms for a certain time period I want to know.
I know I can do this for CPU utilisation and other hypervisor provided stats using:
https://console.cloud.google.com/monitoring/alerting
Thanks.
Request latencies are extracted by cloud load balancers. As long as you are using cloud load balancer you don't need to install monitoring agent to create alerts based 95th Percentile Metrics.
Monitoring agent captures latencies for some preconfigured systems such as riak, cassandra and some others. Here's a full list of systems and metrics monitoring agent supports by default: https://cloud.google.com/monitoring/api/metrics_agent
But if you want anything custom, i.e. you want to measure request latencies from VM you would need to capture response times yourself and configure logging agent to create a custom metric which you can use to create alerts. And as long as you are capturing them as distribution metrics you should be able to visualise different percentiles (i.e. 25, 50, 75, 80, 90, 95 and 99th etc.) and create alert based on that.
see: https://cloud.google.com/logging/docs/logs-based-metrics/distribution-metrics
A. It it possible to use the Stackdriver monitoring service to set
alerts at specific percentiles for HTTP response latencies?
If you want to simply consider network traffic, yes it is possible. Also if you are using a load balancer it's also possible to set alerts on that.
What you want to do should be pretty straight forward from the interface, however you can also find more info in the documentation.
If you want to use some advanced metric on top of tomcat/apache2 etc, you should check the list of metrics provided by the stackdriver monitoring agent here.
B. Can I do this without editing the code of each server process?
Yes, no need to update any program, stackdriver monitoring works transparently and will be able to fetch basic metrics from a GCP VMs without the need of the monitoring agent, including network traffic and cpu utilization.
C. Will installing the agent into the VM report HTTP latencies?
No, the agent shouldn't cause any http latencies.

Request Limit Per Second on GCP Load Balance in front of Storage Bucket website

I want to know the limit of requests per second for Load Balancer on Google Cloud Platform. I didn't found this information on documentation.
My project is a static website hosted on Storage Bucket behind the Load Balancer and CDN active,
This website will receive a campaign in a Television channel and the estimative is that 100k requests per second for 5 minutes.
Could anyone help me with this information? Its necessary to ask Support for pre-warmup the load balancer before the campaign starts?
From the front page of GCP Load Balancing:
https://cloud.google.com/load-balancing/
Cloud Load Balancing is built on the same frontend-serving
infrastructure that powers Google. It supports 1 million+ queries per
second with consistent high performance and low latency. Traffic
enters Cloud Load Balancing through 80+ distinct global load balancing
locations, maximizing the distance traveled on Google's fast private
network backbone.
This seems to say that 1 million+ request per second is fully supported.
However, with all that said ... I wouldn't wait for "the day" before testing. See if you can't practice a suitable load. Given that this sounds like a finite event with high visibility (television), I'm sure you don't want to wait for the event only to find out something was wrong in the setup or theory. From the perspective of "is 100K request per second through a load balancer" ... the answer appears to be yes.
If you (or you asking on behalf of) a GCP consumer, Google has Technical Account Managers associated with accounts that can be brought into the planning loop ... especially if there are questions on "can we do this". One should always be cautious of sudden high volume needs of GCP resources. Again, through a Technical Account Manager, it does no harm to pre-warn Google of large resource requests. For example, if you said that you needed an extra 5000 Compute Engines, you may be constrained on what regions are available to you given a finite existing capacity. Google, just like other public cloud providers, has to schedule and balance resources in its regions. Timing is also very important. If you need a sudden burst of resources and the time that you need them happens to coincide with some event such as Black Friday (US) or Singles Day (China) special preparation may be needed.

How to extract an instance uptime based on incidents?

On stackdriver, creating an Uptime Check gives you access to the Uptime Dashboard that contains the uptime % of your service:
My problem is that uptime checks are restricted to http/tcp checks. I have other services running and those services report their health in different ways (say, for example, by a specific process running). I have incident policies already set up for this services, so if the service is not running I get notified.
Now I want to be able to look back and know how long the service was down for the last hour. Is there a way to do that?
There's no way to programmatically retrieve alerts at the moment, unfortunately. Many resource types expose uptime as a metric, though (e.g., instance/uptime on GCE instances) - could you pull those and do the math on them? Without knowing what resource types you're using, it's hard to give specific suggestions.
Aaron Sher, Stackdriver engineer

Passenger - Using "Requests in queue" as an AWS metric for autoscaling

I am surprised to find little information regarding EC2 autoscaling with Phusion Passenger.
I actually discovered not so long ago a metric "Requests in queue" being exposed upon running passenger-status
I am wondering whether this stat would make a nice metric to help with autoscaling.
Right now most AWS EC2 Autoscaling guides mention using CPU and Memory to write autoscaling rules but I find this insufficient. When I think about the problem autoscaling should solve, that is being able to scale up to the demand, I'd rather base those rules on the number of pending/completed requests to report a node health or a cluster congestion, and Passenger "Requests in queue" (and also for each process, the "Last Used" and "Processed" count) seems to useful.
I am wondering it it would be possible to report this "Requests in queue" stat (and eventually others) periodically as an AWS metric. I was thinking the following rule would be ideal for autoscaling : If the average number of "requests in queue" on the autoscaled instances is to exceed a threshold value, this would trigger spawning a new machine from the autoscaling group.
Is this possible ?
Has anyone ever tried to implement autoscaling rules based on number of requests in queue this way ?
This is totally possible (and a good approach).
Step 1. Create custom CloudWatch metric for "Requests in queue".
You will have to write your own agent that runs passenger-status, extracts the value and sends it to CloudWatch. You can use any AWS SDK or just AWS CLI: http://docs.aws.amazon.com/cli/latest/reference/cloudwatch/put-metric-data.html
Step 2. Create alarms for scale up and scale down based on your custom metric.
Step 3. Modify scaling policy for your Auto Scaling Group to use your custom alarms to scale up/down.