502 Server Error sometime on Google Compute Engine - google-cloud-platform

I set up a server on Google Compute Engine with Apache server on Ubuntu 16.04.4 LTS. It's protected with IAP.
It was fine all along for about 6 months but now some of the users encounter 502 Server Error.
I already checked the following links
Some 502 errors in GCP HTTP Load Balancing [Changed the Apache KeepAliveTimeout to 620]
502 response coming from errors in Google Cloud LoadBalancer [Removed ajax requests]
But the problem is still there.
Here is the error message from one of the log.
{
httpRequest: {…}
insertId: "170sg34g5fmld90"
jsonPayload: {
#type: "type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry"
statusDetails: "failed_to_pick_backend"
}
logName: "projects/sggc-web01/logs/requests"
receiveTimestamp: "2018-03-14T07:21:55.807802906Z"
resource: {…}
severity: "WARNING"
spanId: "44a49bf1b3893412"
timestamp: "2018-03-14T07:21:53.048717425Z"
trace: "projects/sggc-web01/traces/f35119d8571f20df670b0d53ab6b3210"
}
Please help me to trace and fix the issue. Thank you!

The error is not being caused by the server but the load balancer.
For the error we can see in the statusDetails "failed_to_pick_backend" it is being caused because all the instances were unhealthy (or still are) when it tries to establish the connection.
This can be because:
1 - The CPU usage of the instances were too high and they weren't able to answer the health check request from the load balancer showing as unhealthy to it.
2 - The health checks aren't being allowed in the firewall (I doubt this can be the reason if it worked before)

Related

PCF - cf push - 502 Bad Gateway

We are trying to perform an EAR deployment in PCF, the deployment bundle's size is around 200 MB. The buildpack we use is WAS Liberty buildpack. While we do a cf push we are consistently getting this below error,
HTTP/1.1 502 Bad Gateway
X-Cf-Routererror: endpoint_failure (context deadline exceeded)
io: read/write on closed pipe
Is there a specific reason for this behavior apart from network bandwidth, latency etc.,?

Istio Circuit Breaker who trips it?

I am currently doing research on the service mesh Istio in version 1.6. The data plane (Envoy proxies) are configured by the controle plane.
When I configure a Circuit Breaker by creating a Destination rule and the circuit breaker opens, does the client side sidecar proxy already return the 503 or the server side sidecar proxy?
Does the client side sidecar proxy route the request to another available instance of the service automatically or does it simply return the 503 to the application container?
Thanks in advance!
In the log entries, you can inspect them to figure out both end of the connection that was stopped by the circuit breaker. IP addresses of both sides of the connection are present in the log message from the istio-proxy container.
{
insertId: "..."
labels: {
k8s-pod/app: "circuitbreaker-jdwa8424"
k8s-pod/pod-template-hash: "..."
}
logName: ".../logs/stdout"
receiveTimestamp: "2020-06-09T05:59:30.209882320Z"
resource: {
labels: {
cluster_name: "..."
container_name: "istio-proxy"
location: "..."
namespace_name: "circuit"
pod_name: "circuit-service-a31cb334d-66qeq"
project_id: "..."
}
type: "k8s_container"
}
severity: "INFO"
textPayload: "[2020-06-09T05:59:27.854Z] UO 0 0 0 "-" - - 172.207.3.243:443 10.1.13.216:36774 "
timestamp: "2020-06-09TT05:59:28.071001549Z"
}
The message is coming from istio-proxy container which runs Envoy that was affected by CircuitBreaker policy that request was sent to. Also there is the IP address of both the source and destination of the connection that was interrupted.
It will return 503. There is option to configure retries, however I did not test its synergy with CircuitBreaker and if the retry actually will go to different pod if previous returned an error.
Also take a look at the most detailed explanation of CircuitBreaker I managed to find.
Hope it helps.

istio tracking network request and finding point of failure

Using Istio 1.2.10-gke.3 on gke
curl -v -HHost:user.domain.com --resolve user.domain.com:443:$gatewayIP https://user.domain.com/auth -v -k
return a 503 after tls verification
< date: Tue, 19 May 2020 20:50:29 GMT
< server: istio-envoy
Now I want to track the request and identify the first point of failure by tracing the logs of the components involved and resolve the issue
The logs of the istio-ingressgateway pod show nothing. After getting a shell on the pod, I do a top and see an envoy process running, however I don't see any logs for the envoy in /var/log/
What am I missing? Am I looking at the wrong place? Or do I need to read the code of the framework to be able to use it?
I need to find out which link in the request processing chain broke first and the reason so that the same can be fixed
Here are some useful links to istio documentation for debugging error 503:
Istio documentation for envoy access logs
Istio documentation for Connectivity troubleshooting.
Useful envoy debugging tool istioctl.
$ istioctl proxy-status
Also one rare case where error 503 could be present.
This error could also be present if envoy sidecar proxy has issues or did not properly inject itself to deployment pod. Or when there are mTLS miss-configurations.
Hope it helps.

Route53 Domain Transfer - Registry error - 2400 : Command failed (421 SESSION TIMEOUT)

I am trying to transfer a domain using Route53 and after a few minutes I receive an email with the following error.
Registry error - 2400 : Command failed (421 SESSION TIMEOUT)
Anyone have any ideas what this means or how to get around it?
I have never seen your error. There is a document on transferring domains with error messages. The reason that I am responding is that I have seen domain transfers fail going to Route 53 without every learning why they failed. Maybe this will help you.
NSI Registry Registrar Protocol (RRP)
421 Command failed due to server error. Client should try again A
transient server error has caused RRP command failure. A subsequent
retry may produce successful results.

getting java.io.IOException: com.amazonaws.AmazonServiceException while deploying the application to cloud hub

Hi Have VPC established between cloudhub and my netwrok and it was fine when i used to deploy the application there .
Suddenly , i have started getting the following error while deploying the application in the same environment .
Cannot update load balancer: java.io.IOException: com.amazonaws.AmazonServiceException: Rate for operation ChangeResourceRecordSets exceeded (Service: AmazonRoute53; Status Code: 400; Error Code: Throttling; Request ID: c811181f-0887-11e7-873f-5d34219fe6f8)
Can some one let me know whats happening here ?
please refer to this forum post .
https://forums.mulesoft.com/questions/60915/getting-javaioioexception-comamazonawsamazonservic.html
I think this may be related to this.
Amazon Route53 has issues since yesterday: (see https://status.aws.amazon.com/ for more info). Creation/update/delete queries are being throttled.