How to get latency metric from AWS CloudWatch Application ELB? - amazon-web-services

Is there any way to get latency from AWS/ApplicationELB namespace? I know it is available in the AWS/ELB namespace, but I need it for AWS/ApplicationELB, as this is what I use.

The latency metric on ELB is comparable to the TargetResponseTime metric on ALB.
ELB Latency definition: (source)
The total time elapsed, in seconds, from the time the load balancer
sent the request to a registered instance until the instance started
to send the response headers.
ALB TargetResponseTime definition: (source)
The time elapsed, in seconds, after the request leaves the load
balancer until a response from the target is received. This is
equivalent to the target_processing_time field in the access logs.
Further Reading
AWS Documentation - CloudWatch Metrics for Your Application Load Balancer

Related

How to monitor fargate ECS web app timeouts in CloudWatch?

I have a simple setup: Fargate ECS cluster with ALB, running web API.
I want to monitor (and ring alarms) the number of requests timed out for my web app. The only metric close to that I found in CloudWatch is AWS/ApplicationELB -> TargetResponseTime
But, it seems like requests that timed out from the ALB point of view are not recorded there at all.
How do you monitor ALB timeouts?
This answer is only from ALB time out requests point of view.
It is confusing because there is not a specific metric which is termed or contains timeout.
ALB Timeout generates an HTTP 408 error code for which ALB internally increments the HTTPCode_ELB_4XX_Count.
From the Docs
The load balancer sends the HTTP code to the client, saves the request to the access log, and increments the HTTPCode_ELB_4XX_Count or HTTPCode_ELB_5XX_Count metric.
In my view you can set up a CloudWatch alarm to monitor HTTPCode_ELB_4XX_Countmetric and initiate an action (such as sending a notification to an email address) if the metric goes outside what you consider an acceptable range.
More details about the HTTPCode_ELB_4XX_Count -> https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-cloudwatch-metrics.html

Application Load balancer doesn't keep requests till opening new instances by an autoscaling group

On AWS, I created an auto-scaling group with an automated scaling policy that adds a new instance based on an Application Load Balancer: Average Request Count Per Target above 5.
The target group is the number of HTTP requests sent to the Load Balancer.
The ASG is set to min 1, max 10 and desired 1.
I tried to send 200 requests to the ELB and record the IP of the instance that receives the request in a database. I found that most of the requests were sent to the same instance and some of them receive (Gateway Timeout 504) and few of them receive nothing.
The ASG launches new instances but after requests are already sent. So, the new instances receive nothing from the load balancer.
I think the reason is that cloud watch sends the average number of requests per instance every > 1 minute and perhaps opening a new instance happens in a longer time than the timeout of the request.
Q: Is there a method to keep the requests in a queue or increase their timeout till the new instances exist and then distribute these requests on all instances instead of losing them?
Q: If the user sends many requests at the same time, I want the ASG to start scaling immediately and these requests are distributed uniformly on the instances keeping a specific average number of requests per instance.
The solution was using Amazon Simple Queue Service. We forwarded the messages from the API Gateway to the queue. Then, a cloud watch alarm was used to open ECS fargate tasks when the queue size > 1 to read messages from the queue and process them. When the queue is empty, another alarm was used to set the # of tasks in the ECS service to 0.

How can I set up a Cloudwatch alarm for HTTP 4XX/5XX on an ECS service/task?

I'm trying to set up Cloudwatch alarms to monitor my application running in Amazon ECS. This web application runs in Docker containers, configured as an ECS service behind an application load balancer and inside an autoscaling group that can step up/down the number of running tasks.
I've been looking through the different namespaces and metrics that are available in Cloudwatch but am not seeing quite what I'm looking for. If my application receives starts throwing off a high number of HTTP 5XX errors, I want to know about it. Likewise, if my application were to throw off a high number of HTTP 4XX errors, I want to know about that as well.
I see that there are metrics such as HTTPCode_ELB_4XX_Count and HTTPCode_ELB_5XX_Count on the load balancer, but this is not the same as application monitoring. The documentation for those specific metrics even states "This count does not include any response codes generated by the targets."
Which (if any) metrics will monitor the HTTP codes generated by the targets, in the context of an ECS service or task?
If you'r using application load balancer for your application, it's very simple...
Go to ec2-dashboard
targetgroup (which attached to docker containers)
select monitoring tab
there create alarm
and select 4XX or 5XX count

AWS CloudWatch Web Server Metrics

I have a few EC2 instances with NGINX installed using both ports 80 and 443. The instances are serving different applications so I'm not using an ELB.
I would like to create a CloudWatch alarm to make sure port 80 is always returning 200 HTTP status code. I realize there are several commercial solutions for this such as New Relic, etc, but this is the task I have at hand at the moment.
None of the EC2 metrics look to be able to accomplish this, and I cannot use any ELB metrics since I have no ELB.
What's the best way to resolve this?
You can definetly do this manually (send a request and update a metric directly sent to Cloudwatch). Monitor that metric.
Or you could look into Route53 health checks. You might get away with just configuring a health check there if you are already using Route53:
http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover.html
Create a Route53 Heath Check. Supported protocols are TCP, HTTP, and HTTPS.
The HTTP/S protocol supports matching the response payload against a user-defined string so you can not only react to connectivity problems but also to unexpected content being returned to users.
For a more advanced monitoring enable Latency metrics which collect TTFB (time to first byte) and SSL handshake times.
You can then create alarms to get alerts when one your apps becomes inaccessible.

Why would AWS ELB (Elastic Load Balancer) sometimes returns 504 (gateway timeout) right away?

ELB occasionally returns 504 to our clients right away (under 1 seconds).
Problem is, it's totally random, when we repeat the request right away, it works as it should be.
Anyone have same issue or any idea on this?
Does this answers for your quiestion:
Troubleshooting Elastic Load Balancing: HTTP Errors
HTTP 504: Gateway Timeout
Description: Indicates that the load balancer closed a connection because a request did not complete within the idle timeout period.
Cause: The application takes longer to respond than the configured idle timeout.
Solution: Monitor the HTTPCode_ELB_5XX and Latency CloudWatch metrics. If there is an increase in these metrics, it could be due to the application not responding within the idle timeout period. For details about the requests that are timing out, enable access logs on the load balancer and review the 504 response codes in the logs that are generated by Elastic Load Balancing. If necessary, you can increase your back-end capacity or increase the configured idle timeout so that lengthy operations (such as uploading a large file) can complete.
Or this:
504 gateway timeout LB and EC2