ELB Intermittently Return 504 GATEWAY_TIMEOUT - amazon-web-services

I've seen this asked here, here, and here - but without any good answers and was hoping to maybe get some closure on the issue.
I have an ELB connected to 6 instances all running Tomcat7. Up until Friday there were seemingly no issues at all. However, starting about five days ago we started getting around two 504 GATEWAY_TIMEOUT from the ELB per day. That's typically 2/2000 ~ .1%. I turned on logging and see
2018-06-27T12:56:08.110331Z momt-default-elb-prod 10.196.162.218:60132 - -1 -1 -1 504 0 140 0 "POST https://prod-elb.us-east-1.backend.net:443/mobile/user/v1.0/ HTTP/1.1" "BackendClass" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
But my Tomcat7 logs don't have any 504s present at all, implying that the ELB is rejecting these requests without even communicating with the Tomcat.
I've seen people mention setting the Tomcats timeout to be greater than the ELB's timeout - but if that were what were happening (i.e. Tomcat times out and then ELB shuts down), then shouldn't I see a 504 in the Tomcat logs?
Similarly, nothing has changed in the code in a few months. So, this all just started seemingly out of nowhere, and is too uncommon to be a bigger issue. I checked to see if there were some pattern in the timeouts (i.e. tomcat restarting or same instance etc.) but couldn't find anything.
I know other people have run into this issue, but any and all help would be greatly appreciated.

Related

Need RCA for why increasing and decreasing unhealthy threshold value worked for "Instance unhealthy in target-group(Healthcheck failed)"?

I got the below error
(service AWS-Service) (instance i-1234567890abcdefg) (port 55551) is
unhealthy in (target-group
arn:aws:elasticloadbalancing:us-east-1:111111111111:targetgroup/aws-targetgroup/123456789)
due to (reason Health checks failed)
My configurations are:
Unhealthy threshold: 2
Timeout: 5
Interval: 30
Error code: 200
Resolution: I increased the unhealthy threshold to 5 which resulted in the passing of healthchecks and the application came to steady-state. Now, I again reduced unhealthy threshold to 2 and the application remained in a steady-state.
What could be the reason for this weird behavior? How can I diagnose more about the issue? It could also be that my configurations are not optimized, then How can I know the optimized values?
Note1: Earlier the application was working on the same configuration. This time it was a new deployment. The healthcheck was working from inside the container. There was no application related issue.
Note2: ALB is being used.
Note3: I have several more applications which are running with the same target group configuration, same CPU and memory configuration, and the application nature is also the same. They did not face the issue. They were working fine. I only faced this in one of my applications.

Eureka Server memory, renew threshold is 0, self preservation issue - AWS

I deployed 2 instances of Eureka server and a total of 12 instances microservices. .
Renews (last min) is as expected 24. But Renew Threshold is always 0. Is this how it supposed to be when self preservation is turned on? Also seeing this error - THE SELF PRESERVATION MODE IS TURNED OFF. THIS MAY NOT PROTECT INSTANCE EXPIRY IN CASE OF NETWORK/OTHER PROBLEMS. What's the expected behavior in this case and how to resolve this if this is a problem?
As mentioned above, I deployed 2 instances of Eureka Server but after running for a while like around 19-20 hours, one instance of Eureka Server always goes down. Why that could be possibly happening? I checked the processes running using top command and found that Eureka Server is taking a lot of memory. What needs to be configured on Eureka Server so that it don't take a lot of memory?
Below is the configuration in the application.properties file of Eureka Server:
spring.application.name=eureka-server
eureka.instance.appname=eureka-server
eureka.instance.instance-id=${spring.application.name}:${spring.application.instance_id:${random.int[1,999999]}}
eureka.server.enable-self-preservation=false
eureka.datacenter=AWS
eureka.environment=STAGE
eureka.client.registerWithEureka=false
eureka.client.fetchRegistry=false
Below is the command that I am using to start the Eureka Server instances.
#!/bin/bash
java -Xms128m -Xmx256m -Xss256k -XX:+HeapDumpOnOutOfMemoryError -Dspring.profiles.active=stage -Dserver.port=9011 -Deureka.instance.prefer-ip-address=true -Deureka.instance.hostname=example.co.za -Deureka.client.serviceUrl.defaultZone=http://example.co.za:9012/eureka/ -jar eureka-server-1.0.jar &
java -Xms128m -Xmx256m -Xss256k -XX:+HeapDumpOnOutOfMemoryError -Dspring.profiles.active=stage -Dserver.port=9012 -Deureka.instance.prefer-ip-address=true -Deureka.instance.hostname=example.co.za -Deureka.client.serviceUrl.defaultZone=http://example.co.za:9011/eureka/ -jar eureka-server-1.0.jar &
Is this approach to create multiple instances of Eureka Server correct?
Deployment is on AWS. Is there any specific configuration needed for Eureka Server on AWS?
Spring Boot version: 2.3.4.RELEASE
I am new to all these, any help or direction will be a great help.
Let me try to answer your question one by one.
Renews (last min) is as expected 24. But Renew Threshold is always 0. Is this how it supposed to be when self-preservation is turned on?
What's the expected behaviour in this case and how to resolve this if this is a problem?
I can see that eureka.server.enable-self-preservation=false in your configuration, This is really needed if you want to remove an already registered application as soon as it fails to renew its lease.
Self-preservation feature is to prevent the above-mentioned situation since it can happen if there are some network hiccups. Say, you have two services A and B, both are registered to eureka and suddenly, B failed to renew its lease because of a temporary network hiccup. If Self-preservation is not there then B will be removed from the registry and A won't be able to reach B despite B is available.
So we can say that Self-preservation is a resiliency feature of eureka.
Renews threshold is the expected renews per minute, Eureka server enters self-preservation mode if the actual number of heartbeats in last minute(Renews) is less than the expected number of renews per minute(Renew Threshold) and
Of course, you can control the Renews threshold. you can configure renewal-percent-threshold (by default it is 0.85)
So in your case,
Total number of application instances = 12
You don't have eureka.instance.leaseRenewalIntervalInSeconds so default value 30s
and eureka.client.registerWithEureka=false
so Renewals(last minute) will be 24
You don't have renewal-percent-threshold configured, so the default value is 0.85
Number of renewals per application instance per minute = 2 (30s each)
so in case of self-preservation is enable Renews threshold will be calculated as 2 * 12 * 0.85 = 21 (rounded)
And in your case self-preservation is turned off, so Eureka won't calculate Renews Threshold
One instance of Eureka Server always goes down. Why that could be possibly happening?
I'm not able to answer this question time being, this can be because of multiple reasons.
You can find the reason mostly from logs, or if you can post logs here it would be great.
What needs to be configured on Eureka Server so that it doesn't take a lot of memory?
From the information that you have provided, I cannot tell about your memory issue and in addition to that you already specified -Xmx256m and I didn't face any memory issues with the eureka servers so far.
But I can say that top is not the right tool for checking the memory consumed by your java process. When JVM starts, It takes some memory from the operating system.
This is the amount of memory you see in tools like ps and top. so better use jstat or jvmtop
Is this approach to create multiple instances of Eureka Server correct?
It seems you are having the same hostname(eureka.instance.hostname) for both instances. Replication won't work if you use the same hostname.
And make sure that you have the same application names in both instances.
Deployment is on AWS. Is there any specific configuration needed for Eureka Server on AWS?
Nothing specifically for AWS as per my knowledge, other than making sure that the instances can communicate with each other.

Google Places API error 502 - The server encountered a temporary error

we run a website that obtains location data through the Google Place API. We have 150k daily searches available, which we haven´t met yet as the website has been live for few weeks only. We have suddenly received a 502 error. A notification in the Console says: “The server encountered a temporary error and could not complete your request.”. Is this a temporary error? Is there any suggestions on what we can do? The website hasn’t been available for 40 minutes.
When you receive 5xx status or UNKNOWN_ERROR in the response, you should implement a retrying logic. Google has a following recommendation in their web services documentation:
In rare cases something may go wrong serving your request; you may receive a 4XX or 5XX HTTP response code, or the TCP connection may simply fail somewhere between your client and Google's server. Often it is worthwhile re-trying the request as the followup request may succeed when the original failed. However, it is important not to simply loop repeatedly making requests to Google's servers. This looping behavior can overload the network between your client and Google causing problems for many parties.
A better approach is to retry with increasing delays between attempts. Usually the delay is increased by a multiplicative factor with each attempt, an approach known as Exponential Backoff.
https://developers.google.com/maps/documentation/directions/web-service-best-practices#exponential-backoff
However, if retrying logic with Exponential Backoff doesn't help and the error persists for a long time you should file a bug in Google issue tracker
I hope this addresses your doubt!
UPDATE
There was an issue on Google side yesterday (November 6, 2017), you can refer to the following bug that explains the issue:
https://issuetracker.google.com/issues/68938173

AWS Elastic Beanstalk: Looooooooong HEAD requests

I've just deployed a simple Java/Tomcat based application into Elastic Beanstalk (using the java8/tomcat8 config). Mostly the application works fine.
However, all HEAD requests seem to take 60 seconds. Feels like a timeout of some kind. I can't seem to find any settings regarding filtering or delaying particular types of requests. These requests work fine when I run locally. GET requests to the same URL work fine.
I've confirmed that both the Tomcat and the Apache instance on the server log the HEAD request instantly (which indicates they are done with it, right?).
I've confirmed (using telnet) that the client is not receiving any response header bytes until very late. This isn't a problem of the client waiting for a payload or something like that.
Furthermore, the delay is clearly tied to the load balancer's "Idle Timeout" setting. If I push that down to 5 seconds, then the HEAD requests take about 5 seconds, if I set the idle-timeout to 20 seconds then the HEAD requests take just about 20 seconds (always a few ms over). The default is 60s.
What could be causing all HEAD requests (even those returning a 401 unauthorized error, no processing) to clog up the works like that?
Turns out the problem was a firewall issue at the local site. AWS ElasticBeanstock was returning the responses in a timely manner, but they were getting clogged up in a local firewall. Grr..

Cloudfoundry URL(smile-spring.coludfoundry.com) Inaccessible with following Status

C:>vmc stats smile
DL is deprecated, please use Fiddle
Getting stats for smile... OK
instance cpu memory disk
0 0.5% 198.2K of 1G 54.6M of 2G
The 404 is coming from Tomcat which means connections are being forwarded correctly. Looking at the location the 404 is coming from;
/WEB-INF/views/WEB-INF/views/home.jsp.jsp
one can only assume there is something suspect with the way requests are being routed with in the application itself.