Eureka Server memory, renew threshold is 0, self preservation issue - AWS - amazon-web-services

I deployed 2 instances of Eureka server and a total of 12 instances microservices. .
Renews (last min) is as expected 24. But Renew Threshold is always 0. Is this how it supposed to be when self preservation is turned on? Also seeing this error - THE SELF PRESERVATION MODE IS TURNED OFF. THIS MAY NOT PROTECT INSTANCE EXPIRY IN CASE OF NETWORK/OTHER PROBLEMS. What's the expected behavior in this case and how to resolve this if this is a problem?
As mentioned above, I deployed 2 instances of Eureka Server but after running for a while like around 19-20 hours, one instance of Eureka Server always goes down. Why that could be possibly happening? I checked the processes running using top command and found that Eureka Server is taking a lot of memory. What needs to be configured on Eureka Server so that it don't take a lot of memory?
Below is the configuration in the application.properties file of Eureka Server:
spring.application.name=eureka-server
eureka.instance.appname=eureka-server
eureka.instance.instance-id=${spring.application.name}:${spring.application.instance_id:${random.int[1,999999]}}
eureka.server.enable-self-preservation=false
eureka.datacenter=AWS
eureka.environment=STAGE
eureka.client.registerWithEureka=false
eureka.client.fetchRegistry=false
Below is the command that I am using to start the Eureka Server instances.
#!/bin/bash
java -Xms128m -Xmx256m -Xss256k -XX:+HeapDumpOnOutOfMemoryError -Dspring.profiles.active=stage -Dserver.port=9011 -Deureka.instance.prefer-ip-address=true -Deureka.instance.hostname=example.co.za -Deureka.client.serviceUrl.defaultZone=http://example.co.za:9012/eureka/ -jar eureka-server-1.0.jar &
java -Xms128m -Xmx256m -Xss256k -XX:+HeapDumpOnOutOfMemoryError -Dspring.profiles.active=stage -Dserver.port=9012 -Deureka.instance.prefer-ip-address=true -Deureka.instance.hostname=example.co.za -Deureka.client.serviceUrl.defaultZone=http://example.co.za:9011/eureka/ -jar eureka-server-1.0.jar &
Is this approach to create multiple instances of Eureka Server correct?
Deployment is on AWS. Is there any specific configuration needed for Eureka Server on AWS?
Spring Boot version: 2.3.4.RELEASE
I am new to all these, any help or direction will be a great help.

Let me try to answer your question one by one.
Renews (last min) is as expected 24. But Renew Threshold is always 0. Is this how it supposed to be when self-preservation is turned on?
What's the expected behaviour in this case and how to resolve this if this is a problem?
I can see that eureka.server.enable-self-preservation=false in your configuration, This is really needed if you want to remove an already registered application as soon as it fails to renew its lease.
Self-preservation feature is to prevent the above-mentioned situation since it can happen if there are some network hiccups. Say, you have two services A and B, both are registered to eureka and suddenly, B failed to renew its lease because of a temporary network hiccup. If Self-preservation is not there then B will be removed from the registry and A won't be able to reach B despite B is available.
So we can say that Self-preservation is a resiliency feature of eureka.
Renews threshold is the expected renews per minute, Eureka server enters self-preservation mode if the actual number of heartbeats in last minute(Renews) is less than the expected number of renews per minute(Renew Threshold) and
Of course, you can control the Renews threshold. you can configure renewal-percent-threshold (by default it is 0.85)
So in your case,
Total number of application instances = 12
You don't have eureka.instance.leaseRenewalIntervalInSeconds so default value 30s
and eureka.client.registerWithEureka=false
so Renewals(last minute) will be 24
You don't have renewal-percent-threshold configured, so the default value is 0.85
Number of renewals per application instance per minute = 2 (30s each)
so in case of self-preservation is enable Renews threshold will be calculated as 2 * 12 * 0.85 = 21 (rounded)
And in your case self-preservation is turned off, so Eureka won't calculate Renews Threshold
One instance of Eureka Server always goes down. Why that could be possibly happening?
I'm not able to answer this question time being, this can be because of multiple reasons.
You can find the reason mostly from logs, or if you can post logs here it would be great.
What needs to be configured on Eureka Server so that it doesn't take a lot of memory?
From the information that you have provided, I cannot tell about your memory issue and in addition to that you already specified -Xmx256m and I didn't face any memory issues with the eureka servers so far.
But I can say that top is not the right tool for checking the memory consumed by your java process. When JVM starts, It takes some memory from the operating system.
This is the amount of memory you see in tools like ps and top. so better use jstat or jvmtop
Is this approach to create multiple instances of Eureka Server correct?
It seems you are having the same hostname(eureka.instance.hostname) for both instances. Replication won't work if you use the same hostname.
And make sure that you have the same application names in both instances.
Deployment is on AWS. Is there any specific configuration needed for Eureka Server on AWS?
Nothing specifically for AWS as per my knowledge, other than making sure that the instances can communicate with each other.

Related

Google App Engine (GAE) basic scaling backend instance serves one request and undeploys

I have deployed an application (frontend and backend) in App Engine. First of all, I am using the free tier and I chose the default F1 for the frontend and B2 for the backend. I don't exactly understand the difference between B and F instances but based on their names, I chose them for backend and frontend respectively.
My backend is a Flask application that reads some data from Firestore on #app.before_first_request and "pre-caches" it for all future requests. This takes about 20-30 seconds before the first request is served so I really don't want the backend instance to become undeployed all the time.
Right now, my backend successfully serves one request (that I am making from the browser) and then immediately gets undeployed (basically I see no active instances in App Engine dashboard after the request is served). This means that every request once again has the same long delay upon server start that I don't want. I am not sure why this is happening because I've set idle timeout to 5 minutes. I know it is not a problem with my Flask application because it does not crash after a request on a local machine and I've done its memory profiling which is in bounds of B2 limits. This is my app.yaml for the backend:
runtime: python38
service: api
env_variables:
PORT: 8080
instance_class: B2
basic_scaling:
max_instances: 1
idle_timeout: 5m
Any insight would be appreciated!
Based on the information and behavior that you are exposing, please allow me to explain to you that both Scaling models are behaving as they are designed to do so.
“Automatic Scaling: It creates instances based on request rate, response latencies, and other application metrics. You can specify thresholds for each of these metrics, and a minimum number instances to keep running always.
Basic Scaling: Basic scaling creates instances only when your application receives requests. Each instance will be shut down when the application becomes idle. Basic scaling is ideal for work that is intermittent or driven by user activity.”
Use the following URL’s documentation as reference for those models and more of them How Instances are Managed.
Information added on 10/12/2021:
Hi,
I think the correct term is “shutdown” instead of “undeployed” Disabling your application. Looking at Instance States "an instance of a manual or basic scaled service can be either running or stopped. All instances of the same service and version share the same state." then looking at Scaling types "Basic scaling creates instances when your application receives requests. Each instance will be shut down when the application becomes idle. Basic scaling is ideal for work that is intermittent or driven by user activity." and the table's Startup and shutdown row for basic scaling "Instances are created on demand to handle requests and automatically shut down when idle, based on the idle_timeout configuration parameter. An instance that is manually stopped has 30 seconds to finish handling requests before it is forcibly terminated." and Scaling down "You can specify a minimum number of idle instances. Setting an appropriate number of idle instances for your application based on request volume allows your application to serve every request with little latency".
Could you please verify:
that the instance was not manually halted?
that instance is becoming idle?
that there are no background threads?
if functionality is the same when setting the max_instances to 2
that there are no logs showcasing an instance shutdown
that they are reaching the version with the updated the idle_timeout set

How to fix CloudRun error 'The request was aborted because there was no available instance'

I'm using managed CloudRun to deploy a container with concurrency=1. Once deployed, I'm firing four long-running requests in parallel.
Most of the time, all works fine -- But occasionally, I'm facing 500's from one of the nodes within a few seconds; logs only provide the error message provided in the subject.
Using retry with exponential back-off did not improve the situation; the retries also end up with 500s. StackDriver logs also do not provide further information.
Potentially relevant gcloud beta run deploy arguments:
--memory 2Gi --concurrency 1 --timeout 8m --platform managed
What does the error message mean exactly -- and how can I solve the issue?
This error message can appear when the infrastructure didn't scale fast enough to catch up with the traffic spike. Infrastructure only keeps a request in the queue for a certain amount of time (about 10s) then aborts it.
This usually happens when:
traffic suddenly largely increase
cold start time is long
request time is long
We also faced this issue when traffic suddenly increased during business hours. The issue is usually caused by a sudden increase in traffic and a longer instance start time to accommodate incoming requests. One way to handle this is by keeping warm-up instances always running i.e. configuring --min-instances parameters in the cloud run deploy command. Another and recommended way is to reduce the service cold start time (which is difficult to achieve in some languages like Java and Python)
I also experiment the problem. Easy to reproduce. I have a fibonacci container that process in 6s fibo(45). I use Hey to perform 200 requests. And I set my Cloud Run concurrency to 1.
Over 200 requests I have 8 similar errors. In my case: sudden traffic spike and long processing time. (Short cold start for me, it's in Go)
I was able to resolve this on my service by raising the max autoscaling container count from 2 to 10. There really should be no reason that 2 would be even close to too low for the traffic, but I suspect something about the Cloud Run internals were tying up to 2 containers somehow.
Setting the Max Retry Attempts to anything but zero will remedy this, as it did for me.

Simple HelloWorld app on cloudrun (or knative) seems too slow

I deployed a sample HelloWorld app on Google Cloud Run, which is basically k-native, and every call to the API takes 1.4 seconds at best, in an end-to-end manner. Is it supposed to be so?
The sample app is at https://cloud.google.com/run/docs/quickstarts/build-and-deploy
I deployed the very same app on my localhost as a docker container and it takes about 22ms, end-to-end.
The same app on my GKE cluster takes about 150 ms, end-to-end.
import os
from flask import Flask
app = Flask(__name__)
#app.route('/')
def hello_world():
target = os.environ.get('TARGET', 'World')
return 'Hello {}!\n'.format(target)
if __name__ == "__main__":
app.run(debug=True,host='0.0.0.0',port=int(os.environ.get('PORT', 8080)))
I am a little experience in FaaS and I expect API calls would get faster as I invoked them in a row. (as in cold start vs. warm start)
But no matter how many times I execute the command it doesn't go below 1.4 seconds.
I think the network distance isn't the dominant factor here. The round-trip time via ping to the API endpoint is only 50ms away, more or less
So my questions are as follows:
Is it potentially an unintended bug? Is it a technical difficulty which will be resolved eventually? Or maybe nothing's wrong, it's just the SLA of k-native?
If nothing's wrong with Google Cloud Run and/or k-native, what is the dominant time-consuming factor here for my API call? I'd love to learn the mechanism.
Additional Details:
Where I am located at: Seoul/Asia
The region for my Cloud Run app: us-central1
type of Internet connection I am testing under: Business, Wired
app's container image size: 343.3MB
the bucket location that Container Registry is using: gcr.io
WebPageTest from Seoul/Asia (warmup time):
Content Type: text/html
Request Start: 0.44 s
DNS Lookup: 249 ms
Initial Connection: 59 ms
SSL Negotiation: 106 ms
Time to First Byte: 961 ms
Content Download: 2 ms
WebPageTest from Chicago/US (warmup time):
Content Type: text/html
Request Start: 0.171 s
DNS Lookup: 41 ms
Initial Connection: 29 ms
SSL Negotiation: 57 ms
Time to First Byte: 61 ms
Content Download: 3 ms
ANSWER by Steren, the Cloud Run product manager
We have detected high latency when calling Cloud Run services from
some particular regions in the world. Sadly, Seoul seems to be one of
them.
[Update: This person has a networking problem in his area. I tested his endpoint from Seattle with no problems. Details in the comments below.]
I have worked with Cloud Run constantly for the past several months. I have deployed several production applications and dozens of test services. I am in Seattle, Cloud Run is in us-central1. I have never noticed a delay. Actually, I am impressed with how fast a container starts up.
For one of my services, I am seeing cold start time to first byte of 485ms. Next invocation 266ms, 360ms. My container is checking SSL certificates (2) on the Internet. The response time is very good.
For another service which is a PHP website, time to first byte on cold start is 312ms, then 94ms, 112ms.
What could be factors that are different for you?
How large is your container image? Check Container Registry for the size. My containers are under 100 MB. The larger the container the longer the cold start time.
Where is the bucket located that Container Registry is using? You want the bucket to be in us-central1 or at least US. This will change soon with when new Cloud Run regions are announced.
What type of Internet connection are you testing under? Home based or Business. Wireless or Ethernet connection? Where in the world are you testing from? Launch a temporary Compute Engine instance, repeat your tests to Cloud Run and compare. This will remove your ISP from the equation.
Increase the memory allocated to the container. Does this affect performance? Python/Flask does not require much memory, my containers are typically 128 MB and 256 MB. Container images are loaded into memory, so if you have a bloated container, you might now have enough memory left reducing performance.
What does Stackdriver logs show you? You can see container starts, requests, and container terminations.
(Cloud Run product manager here)
We have detected high latency when calling Cloud Run services from some particular regions in the world. Sadly, Seoul seems to be one of them.
We will explicitly capture this as a Known issue and we are working on fixing this before General Availability. Feel free to open a new issue in our public issue tracker

AWS EC2 Instance starting time

Sometimes the starting time of the instance takes more than 5 minutes. In this case, the Status Checks takes more than 4 minutes.
How can I make the instance run less than a minute, including checking the status?
You do not need to wait for the Instance Status Check to complete before using an Amazon EC2 instance.
Linux instances are frequently ready 60-90 seconds after launch. Windows instances take considerably longer because the AMI has been configured for sysprep, which involves a reboot.
New instances take longer to be ready than existing instances because they typically run code on first startup. So, if you Stop and instance and later Start it again, the instance will be available quite quickly (especially Linux instances).
I'm not sure that "You do not need to wait for the Instance Status Check to complete" is
correct, and if the status check failed for any reason you (obviously) have a problem and should investigate before using.
Doing a quick check using a aws jdk script creating a "Nano" instance from a linux image loaded with ubuntu, apache, tomcat, java, mysql etc it took 45 secs to get "running" and 2 mins 15 secs to finish the Status Checks.
Starting an existing "stopped" instance ("Nano") took 18 secs and 2 mins 15 secs to finish the status checks.
You can't change instance health check it manages by aws. When a system status check fails, you can choose to wait for AWS to fix the issue, or you can resolve it yourself by stop and start the instance. which in most cases migrates it to a new host computer.
The following are examples of problems that can cause system status checks to fail:
Loss of network connectivity, Loss of system power
Software issues on the physical host
Hardware issues on the physical host that impact network reachability.
The instance will be accessible once boot. it should not take 5 min of time. you can check instance boot logs or screen from
ec2 --> Action --> Instance settings`Get system log` and `Get instance screenshot` and optimized instance up time.

How to determine that a jvm app does more GC than normal work?

We recently had a problem that our EC2 instances had 90-100 percent cpu load cause of a bug in a library we include that created to many objects instead of reusing them (which was easy solvable), so we spent too much time in GC.
Unfortunately the AWS health checks and instance status metrics didn't cause the overloaded instances to be stopped and then new ones restarted, so after some time we hit the max autoscaling number and....died. Also our own health checks inside the app which are used for the ELB are so simple that they answered often enough to obviously not cause the instances to be terminated...and restarted, which would mitigate that problem for quite some time.
My idea is now to use our custom health check which is already included in the ELB health checks to report a failure if we spent to much time in GC.
How would I do such a thing inside the app?
There are a number of JVM parameters that allow GC monitoring
-Xloggc:<file> // logs gc activity to a file
-XX:+PrintGCDetails // tells you how different generations are impacted
You can either parse these logs yourself or use specific tool such as GCViewer to analyse gc activity.
Use GarbageCollectorMXBean:
long gcTime = 0;
for (GarbageCollectorMXBean gcBean : ManagementFactory.getGarbageCollectorMXBeans()) {
gcTime += gcBean.getCollectionTime();
}
long jvmUptime = ManagementFactory.getRuntimeMXBean().getUptime();
System.out.println("GC ratio: " + (100 * gcTime / jvmUptime) + "%");
You can use VisualVM to monitor what happens inside the JVM and you can monitor remote instances via JMX. You did not describe which application container that you are using (Apache Tomcat, GlassFish etc.), you can set up a JMX connector like this in the case of Tomcat.
Don't forget to adjust Security Groups in AWS to have the proper permission to access the JMX port.
The JVM flags PrintGCApplicationConcurrentTime and PrintGCApplicationStoppedTime will log how long the application was active or suspended. They're a bit of a misnomers since they actually measure time spent in and out of safepoints, not just GCs.