Unexpected latency issues AWS-API Gateway - amazon-web-services

I need help to troubleshoot AWS API gateway latency issues. We have same configuration and even data everything same but facing high latency issues in Non Prod. Actually we are using Nlb and VPC link for API Gateway . Please find same values here below.
We have copied the data from dev mongo to test environment to make sure the same volume of data is present in both the places. We hit /test/16 from both the environment, but experiencing very high latency in dev as compared to sandbox.
Test:
Request:/test/16
Status:200
Latency:213ms
Dev:
Request:/test/16
Status:200
Latency:4896ms

Have you checked your VPC logs to see the flow paths for the requests? If not, I suggest starting there.
As FYI, you can learn about VPC flow logs at https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html#working-with-flow-logs.

What is behind the load balancer? Anything you are reaching for with DNS names or just IPs?
We had a similar problem at one point, looking in the monitoring of the load balancer(ELB) we found that the problem was downstreams.
The monitoring even showed that we got 504s in the load balancer.
In our case it was DNS caching that caused it, the target instances had been replaced and the DNS in some nginx instances, on the network path to the target, had not been updated.
The nginx instances had to be updated with dynamic DNS resolving. Since nginx default only resolved the target on startup.
With out knowing your architecture however, hard to say what can cause your problems. Here is another DNS story, with some debugging examples: https://srvaroa.github.io/kubernetes/migration/latency/dns/java/aws/microservices/2019/10/22/kubernetes-added-a-0-to-my-latency.html 🍿
Good luck.

Related

HAproxy vs ALB or any other load balancer which one to use?

We are looking to separate our blog platform to a separate ec2 server (In Nginx) for better performance and scalability.
Scenario is:
Web request (www.example.com) -> Load Balancer/Route -> Current EC2 Server
Blog request (www.example.com/blog) -> Load Balancer/Route -> New Separate EC2 Server for blog
Please help in this case what is the best option to use:
Haproxy
ALB - AWS
Any other solution?
Also, is it possible to have the load balancer or routing mechanism in a different AWS region? We are currently hosted in AWS.
Haproxy
You would have to set this up on an EC2 server and manage everything yourself. You would be responsible for scaling this correctly to handle all the traffic it gets. You would be responsible for deploying it to multiple availability zones to provide high availability. You would be responsible for installing all security updates on the operating system.
ALB - AWS
Amazon will automatically scale this out to handle any amount of traffic you get. Amazon will handle all security patches of the underlying system. Amazon provides free SSL certificates for ALBs. Amazon will deploy this automatically across multiple availability zones to provide high availability.
Any other solution?
I think AWS Global Accelerator would work here as well, but you would have to weigh the differences between Global Accelerator and ALB to decide which fits your use case and budget the best.
You could also look at placing a CDN in front of everything, like CloudFront or Cloudflare.
Also, is it possible to have the load balancer or routing mechanism in
a different AWS region?
AWS Global Accelerator would be the thing to look at if load balancing in different regions is a concern for you. Given the details you have provided I'm not sure why you would want this however.
Probably what you really need is a CDN in front of your websites, with or without the ALB.
Scenario is:
Web request (www.example.com) -> Load Balancer/Route -> Current EC2
Server Blog request (www.example.com/blog) -> Load Balancer/Route ->
New Separate EC2 Server for blog
In my view you can use ALB deployed in multi AZ for high availability for the following reasons :-
aws alb allows us to route traffic based on various attributes and path in URL is one of them them.
https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-listeners.html#rule-condition-types
With aws ALB you can have two target groups with instance handling traffic one for first path (www.example.com) and second target group for another path (www.example.com/blog).
ALB allows something called SNI (which allows to handle multiple certications behind a single alb for multiple domains), so all you need to do is set up single https listener and upload your certificates https://aws.amazon.com/blogs/aws/new-application-load-balancer-sni/
i have answered on [something similar] it might help you also
This is my opinion, take it as that. I am sure a lot of people wont agree.
If your project is small or personal, you can go with HAProxy (Cheap USD4 or less if you get a t3a as a spot instance) Or free if you place it inside another EC2 of yours may be using docker.
If your project is not personal or not small, go with ALB (Expensive but simpler and better integrated to other AWS stuff)
HAProxy can handle tons of connections, but you have to do more things by yourself. ALB can also handle tons of connections and AWS will do most of the work.
I think HAProxy is more suitable for personal/small projects because if your project doesnt grow, then you dont have to touch HAProxy. It is set and forget the same as ALB but cost less.
You usually wont mind about Availability zones or disaster tolerance in a personal project, so HAProxy should be easy to config.
Another consideration: AWS offers a free tier on ALB, so if your project will run for less than a year ALB is the way to go.
If you are learning, then ALB should be considered because real clients usually love to stick to AWS in all aspects, and HAProxy is your call and also your risk (just to reduce cost for a company that usually pays a lot more for your salary, so not worth the risk).

Single EC2 behind ELB creates massive latency

I have a Flask API running on a single EC2 instance. I need to add SSL (https) to my app and the docs seem to indicate that the best way is to use ELB.
The problem is when I set up the EC2 behind the ELB and with a Route 53 pointing to the load balancer, I get latency issues like never before. API calls that should take 150ms, take 31s. If I ping the EC2 directly, there is no issue, but pinging the Route53/ELB takes too long.
I've looked at past responses such as these:
The reason for the delay is because you have the ELB setup for
multi-az without any application instances in the other 2 AZ's
configured. Without instances in those AZ's requests will tend to fail
because the ELb still returns IP addresses for those AZ's even if
there are no active application instances. Please disable the other
AZ's for now and continue your tests.
So I deleted the subnets and AZs that are not related to my EC2 instance, but now I have:
503 errors: Backend server is at capacity all over the place.
About to pull my hair out. All I am trying to do it setup SSL for my app...
For anyone else struggling with this, using the Application Load Balancer instead of the Classic Load Balancer has resolved this latency issue.
Seems like some connectivity issue. Follow this step to drill down the issue. As its happening from load balancer only. So first do the NSLOOKUP like nslookup name. You will get minimum two IP.
Try to individual curl LB IP's n check the response time. You may use below command or perhaps any of your fav tool.
curl -w "dns_resolution: %{time_namelookup}, \
tcp_established: %{time_connect}, \
ssl_handshake_done: %{time_appconnect}, \
TTFB: %{time_starttransfer}\n" \
-o /dev/null -s "http://<ELB IP>/<path>"
This way you be figure out whether is the issue. It be due to routing issue on one specific AZ subnet, or some other issue.

"ELB health is failing or not available for all instances" + "Request timed out" on Elastic Beanstalk

I have been trying for two days to get rid of this error. It also often says "100.0 % of the requests are failing with HTTP 5xx". I have been reading about troubleshooting from here https://aws.amazon.com/premiumsupport/knowledge-center/elb-fix-failing-health-checks-alb/ but nothing is working. I have tried changing the health check path from '/' to '/healthCheck' as that has worked for some other people.
INFO:
I am using an application load balancer so that I can use HTTPS. I am using t3.micro although I have tried t3.small and t3.medium.
Here are my load balancer settings in the configuration part of console:
My security group for the instance has the same two inbound rules at source 0.0.0.0/0 and outbounds all traffic to 0.0.0.0/0.
And here is some target group info:
Where is the best place to look for this error?
Based on the comments.
The cause of the issue is undetermined. Thus it was decided to make new EB environment in an effort to address the problems.
I had a similar problem, not sure yet how prevalent it will be but it happened to me twice after a number of broken deploys during testing. I even created a specific endpoint to return 200 without luck. It seems to be more of a load balancer problem than an instance problem. A new environment for me cleared up the problem.

API Gateway/NLB/ECS Latency

I have a number of services deployed in ECS. They register with a Network Load Balancer (via a target group). The NLB is private, and is accessed via API Gateway + a VPC link.
Most of the time, requests to my services take ~4-5 seconds, but occasionally < 100ms. The latter should be the standard; the actual requests are served by my node instances in ~10ms or less. I'm starting to dig into this, but was wondering if there was a common bottleneck in setups similar to what I'm currently using.
Any insight would be greatly appreciated!
The answer to this was to enable Cross-Zone Load Balancing on my load balancers. This isn't immediately obvious and took two AWS support sessions to dig it up as the root cause.

Usefulness of Amazon ELB (Elastic Load Balancing

We're considering to implement an ELB in our production Amazon environment. It seems it will require that production server instances be synched by a nightly script. Also, there is a Solr search engine which will need to replicated and maintained for each paired server. There's also the issue of debugging - which server is it going to? If there's a crash, do you have to search both logs? If a production app isn't behaving, how do you isolate which one is is, or do you just deploy debugging code to both instances?
We aren't having issues with response time or server load. This seems like added complexity in exchange for a limited upside. It seems like it may be overkill to me. Thoughts?
You're enumerating the problems that arise when you need high availability :)
You need to consider how critical is the availability of the service and take that into account when defining what is the right solution or just over-engineering :)
Solutions to some caveats:
To avoid nightly syncs: Use an EC2 with NFS server and mount share in both EC2 instances. (Or use Amazon EFS when it's available)
Debugging problem: You can configure the EC2 instances behind the ELB to have public IPs, limited in the Security Groups just to the PCs of the developers, and when debugging point your /etc/hosts (or Windows equivalent) to one particular server.
Logs: store the logs in S3 (or in the NFS server commented above)