aws application load balancer metrics not showing - amazon-web-services

I have created an aws application load balancer. I am trying to test something out on it and I have configured the target group and everything. When I try to hit the load balancer I get a bad gateway error (502), which is expected. However these metrics are not showing up in the monitoring section of the load balancer. I submitted around 5 requests.
Furthermore even after registering an ECS service, I still get bad gatewawy. This is what I see on the load balancer/target groups after registering the service
I have also allowed all traffic inbound and outbound from the two security groups (the security group used by the ECS service and the security group used by the load balancer)
However under the registered target when creating the ECS service I specified two availability zones, but it shows only one registered.

figured it out and its kind of silly. My VPN/network was blocking the call going out to the ALB. I'm not sure why, maybe some sort of network policy. But the url looks something like this my-lb-1123366532.us-west-1.elb.amazonaws.com I wasted almost a day trying to figure this out. I'm just putting it out here in case it helps someone.

Related

Cannot access a public ALB

I have been trying to troubleshoot some connection issues, and I'm struggling with a relatively simple setup.
On my (relatively new) AWS account, I create a new Application Load balancer. I configure it in the following way:
Internet facing
Use the default VPC that came with the account
Across all availability zones
Uses default security group for VPC
Listens on HTTP:80 and returns a fixed response (status 404)
When I then try and use the new dns name assigned, it just hangs. When using curl -v I can see it says:
Trying :80
dig also responds with 3 IPs (I'm assuming for the different zones).
It feels like I'm missing something obvious, but I'm struggling to find it myself.
Can anyone see what I may be missing?
Can you please share a print screen of the default security group and LB configuration?
I am almost sure that the default security group has opened ALL inbound traffic but only for itself (security group).

Load balancer giving failed_to_pick_backend with internet network endpoint group

I have a load balancer setup pointing to an external url via internet network endpoint group (internet NEG)
Now the load balancer returns 502 status code & I see failed_to_pick_backend in the logs. Also the monitoring tab of the load balancer shows INVALID_BACKEND next to the internet NEG I've defined. I've attached screenshots of the view for clarity, latter one is the one that's failing. I've checked the NEGs and they seem identical.
All the suggestions so far mention fixing health checks, but as seen from the docs, internet NEGs does not support health checks.
I was able to create working setup through the UI, but when replicating the setup via terraform, things starts to fail. The only difference I saw was that the setup done via UI, the appropriate forwarding rule had ipVersion: IPV4, but that was not possible to setup through terraform since it takes either ipVersion or ip and I gave the resource ip.
So, what could cause failed_to_pick_backend & INVALID_BACKEND with setup like mine?
I found the answer to my question from another post: https://serverfault.com/a/1065279/965524
google_compute_global_network_endpoint needs to be created before google_compute_backend_service is created so you need to set depends_on = [google_compute_global_network_endpoint.endpoint] to your google_compute_backend_service. Otherwise you will hit errors like described in the question.

Slow response when "Unmanaged Instance Group" added to HTTPS Load Balancer

HTTPS Load Balancer Proxy works great with Managed Instance Group but not with unmanaged instance group. We have added few Unmanaged Instance Group to the backend and have instructed Proxy to direct specific traffic to unmanaged group e.g. https://test.example.com to unmanaged instance group. When the testing is done we can stop the instances in unmanaged instance groups. However stopping individual VM instances with in managed group is not possible.
Every thing is working as expected. However, browser takes 10-15 seconds (not always but mostly) to display the page and randomly receives 500 error. It seems that instances in unmanaged group are stopped or Load Balancer does some house keeping which takes long to respond.
Any help or suggestions to fix the response time would be highly appreciated. Direct accessing the web server by avoiding the load balancer works as expected but https can't be used as only Proxy Server has the SSL certificate.
I'm taking an educated guess here based on your detailed description of symptoms.
As you noticed there's something going on "behind the scenes" of the load balancer and either health checks are failing or some other feature that is responsible for "updating" load balancer that test backend is shut off.
This shouldn't be happening and it looks like a bug.
At this poing I think the best way for you is report a new issue at Google's Issuetracker and include detailed description of what happens. You may link to this question too :)

AWS CodeDeploy: stuck on install step

I'm running through this tutorial to create a deployment pipeline with my custom .net-based docker image.
But when I start a deployment, it's stuck on install phase, so I have to stop it manually:
After that I get a couple of running tasks with different task definitions (note :1 and :4, 'cause I've tried to run deployment 4 times by now):
They also change their state RUNNING->PROVISIONING->PENDING all the time. And the list of stopped tasks grows:
Q:
So, how to hunt down the issue with CodeDeploy? Why It's running forever?
UPDATE:
It is connected to health checks.
UPDATE:
I'm getting this:
(service dataapi-dev-service, taskSet ecs-svc/9223370487815385540) (port 80) is unhealthy in target-group dataapi-dev-tg1 due to (reason Health checks failed with these codes: [404]).
Don't quite understand, why is it failing for newly created container, 'cause the original one passes health-check.
While the ECS task is running, ELB (Elastic Load Balancer) will constantly do healthchecking the container as you config in the target group to check if the container is still responding.
From your debug message, the container (api) responded the healthcheck path with 404.
I suggest you config the healhcheck path in target group dataapi-dev-tg1.
For those who are still hitting this issue: in my case the ECS cluster had no outbound connectivity.
Possible solutions to this problem:
make security groups you use with your VPC allow outbound traffic
make sure that the route table you use with VPC has subnet associations with subnets you use with your load balancer (examine route tables)
I have able to figure it out because I enabled CloudWatch during ECS cluster creation and got CannotPullContainerError. For more information on solving this problem look into Cannot Pull Container Image Error.
Make sure your Internet Gateway is attached to your Subnets through the Route Table (Routes), if your Load Balancer is internet facing.
The error is due to health check which detected an unhealthy target.
Make sure to check your configuration in Target group settings.

Can't register EC2 instance in ELB

I'm trying to set up an AWS environment for the first time and have so far got a DNS entry pointing to a Classic ELB. I also have an EC2 instance running but I don't seem to be able to add the EC2 instance to the ELB in the UI. The documentation says that I can do this in the "Instances" tab in Load balancer screen but I don't have the tab at all. All I can see is Description | Listeners | Monitoring | Tags.
Does anyone have any ideas why the "Instances" tab night be missing from the UI?
There are now two different types of Elastic Load Balancer.
ELB/1.0, the original, is now called a "Classic" balancer.
ELB/2.0, the new version, is an "Application Load Balancer" (or "ALB").
They each have different features and capabilities.
One notable difference is that ALB doesn't simply route traffic to instances, it routes traffic to targets on instances, so (for example) you could pool multiple services on the same instance (e.g. port 8080, 8081, 8082) even though those requests all came into the balancer on port 80. And these targets are configured in virtual objects called target groups. So there are a couple of new abstraction layers, and this part of the provisioning workflow is much different. It's also provisioned using a different set of APIs.
Since the new Application Load Balancer is the default selection in the console wizard, it's easy to click past it, not realizing that you've selected something other than the classic ELB you might have been expecting, and that sounds like what occurred in this case.
It might be possible that you have selected Application Load Balancer instead of selecting Classic Load Balancer.
If that is the case, then you need to add your instance in Target Group as Instances tab is not available in the Application Load Balancer.
I hope above may resolve your case.