I have tried several times in the last two weeks to log on to a c5.metal instance. Each time I get "Initializing" in the status checks field, but after 10 minutes it is still "Initializing" and I'm not able to log on. I have had success with c5.metal before, but not any more.
Today I also tried to get an m5.metal instance. This time the instance successfully initialized after 10 minutes but I was not able to log on with Putty. I stopped the instance, then after about 30 minutes I tried again and this time I did not get past "Initializing" in the status check field and I stopped it after 15 minutes.
I get billed for the 10 to 15 minute bare metal wait periods, even when initialization doesn't complete. I have no problems with AWS virtual instances.
Thanks for any ideas on what I can do to get the bare metal instances to work.
To reproduce your situation, I did the following:
Launched an Amazon EC2 instance in Ohio:
Instance Type: c5.metal
AMI: Ubuntu Server 18.04 LTS (HVM), SSD Volume Type
Network: In my Default VPC so that it uses a Public Subnet
Security Group: Default settings, which grants port 22 access from the Internet
Instance entered running state very quickly, Status Checks showed as Initializing
It took about 8 minutes until the status checks were showing 2/2 checks (it might have been faster, but I was testing other things in the meantime).
I was able to successfully login to the instance:
Welcome to Ubuntu 18.04.4 LTS (GNU/Linux 4.15.0-1065-aws x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
System information as of Sat Jun 6 23:21:18 UTC 2020
System load: 0.02 Processes: 924
Usage of /: 13.7% of 7.69GB Users logged in: 0
Memory usage: 0% IP address for enp125s0: 172.31.9.77
Swap usage: 0%
0 packages can be updated.
0 updates are security updates.
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.
ubuntu#ip-172-31-9-77:~$
(Actually, I first tried to login as ec2-user and it took me a while to realize this was an Ubuntu AMI, so I connected as ubuntu).
It is possible that the slow startup is due to the Operating System or hardware checking the 192GB of RAM that is allocated to the instance.
I booted another instance using an Amazon Linux 2 AMI and it required approximately 7 minutes before I could connect.
I also noticed that the c5.metal instances did not provide anything for "Get System Log" or "Get Instance Screenshot". This might be a result of using a bare-metal instance.
I joined John Rotenstein's twitch.tv channel and he showed how he got a c5.metal instance. What I learned is that if a metal instance does not work in the region you had chosen, try launching a new instance in a different data center region. For example, I had a c5.metal instance at us-east-2a. Following John's directions, I launched an instance at us-east-2c and after about 8 minutes the instance was ready for use.
Related
I deployed 2 instances of Eureka server and a total of 12 instances microservices. .
Renews (last min) is as expected 24. But Renew Threshold is always 0. Is this how it supposed to be when self preservation is turned on? Also seeing this error - THE SELF PRESERVATION MODE IS TURNED OFF. THIS MAY NOT PROTECT INSTANCE EXPIRY IN CASE OF NETWORK/OTHER PROBLEMS. What's the expected behavior in this case and how to resolve this if this is a problem?
As mentioned above, I deployed 2 instances of Eureka Server but after running for a while like around 19-20 hours, one instance of Eureka Server always goes down. Why that could be possibly happening? I checked the processes running using top command and found that Eureka Server is taking a lot of memory. What needs to be configured on Eureka Server so that it don't take a lot of memory?
Below is the configuration in the application.properties file of Eureka Server:
spring.application.name=eureka-server
eureka.instance.appname=eureka-server
eureka.instance.instance-id=${spring.application.name}:${spring.application.instance_id:${random.int[1,999999]}}
eureka.server.enable-self-preservation=false
eureka.datacenter=AWS
eureka.environment=STAGE
eureka.client.registerWithEureka=false
eureka.client.fetchRegistry=false
Below is the command that I am using to start the Eureka Server instances.
#!/bin/bash
java -Xms128m -Xmx256m -Xss256k -XX:+HeapDumpOnOutOfMemoryError -Dspring.profiles.active=stage -Dserver.port=9011 -Deureka.instance.prefer-ip-address=true -Deureka.instance.hostname=example.co.za -Deureka.client.serviceUrl.defaultZone=http://example.co.za:9012/eureka/ -jar eureka-server-1.0.jar &
java -Xms128m -Xmx256m -Xss256k -XX:+HeapDumpOnOutOfMemoryError -Dspring.profiles.active=stage -Dserver.port=9012 -Deureka.instance.prefer-ip-address=true -Deureka.instance.hostname=example.co.za -Deureka.client.serviceUrl.defaultZone=http://example.co.za:9011/eureka/ -jar eureka-server-1.0.jar &
Is this approach to create multiple instances of Eureka Server correct?
Deployment is on AWS. Is there any specific configuration needed for Eureka Server on AWS?
Spring Boot version: 2.3.4.RELEASE
I am new to all these, any help or direction will be a great help.
Let me try to answer your question one by one.
Renews (last min) is as expected 24. But Renew Threshold is always 0. Is this how it supposed to be when self-preservation is turned on?
What's the expected behaviour in this case and how to resolve this if this is a problem?
I can see that eureka.server.enable-self-preservation=false in your configuration, This is really needed if you want to remove an already registered application as soon as it fails to renew its lease.
Self-preservation feature is to prevent the above-mentioned situation since it can happen if there are some network hiccups. Say, you have two services A and B, both are registered to eureka and suddenly, B failed to renew its lease because of a temporary network hiccup. If Self-preservation is not there then B will be removed from the registry and A won't be able to reach B despite B is available.
So we can say that Self-preservation is a resiliency feature of eureka.
Renews threshold is the expected renews per minute, Eureka server enters self-preservation mode if the actual number of heartbeats in last minute(Renews) is less than the expected number of renews per minute(Renew Threshold) and
Of course, you can control the Renews threshold. you can configure renewal-percent-threshold (by default it is 0.85)
So in your case,
Total number of application instances = 12
You don't have eureka.instance.leaseRenewalIntervalInSeconds so default value 30s
and eureka.client.registerWithEureka=false
so Renewals(last minute) will be 24
You don't have renewal-percent-threshold configured, so the default value is 0.85
Number of renewals per application instance per minute = 2 (30s each)
so in case of self-preservation is enable Renews threshold will be calculated as 2 * 12 * 0.85 = 21 (rounded)
And in your case self-preservation is turned off, so Eureka won't calculate Renews Threshold
One instance of Eureka Server always goes down. Why that could be possibly happening?
I'm not able to answer this question time being, this can be because of multiple reasons.
You can find the reason mostly from logs, or if you can post logs here it would be great.
What needs to be configured on Eureka Server so that it doesn't take a lot of memory?
From the information that you have provided, I cannot tell about your memory issue and in addition to that you already specified -Xmx256m and I didn't face any memory issues with the eureka servers so far.
But I can say that top is not the right tool for checking the memory consumed by your java process. When JVM starts, It takes some memory from the operating system.
This is the amount of memory you see in tools like ps and top. so better use jstat or jvmtop
Is this approach to create multiple instances of Eureka Server correct?
It seems you are having the same hostname(eureka.instance.hostname) for both instances. Replication won't work if you use the same hostname.
And make sure that you have the same application names in both instances.
Deployment is on AWS. Is there any specific configuration needed for Eureka Server on AWS?
Nothing specifically for AWS as per my knowledge, other than making sure that the instances can communicate with each other.
i try in GCE to create a copy of a running system, i did a snapshot from the running system and set up a new instance. On starting it showed to be running but the log from console one says:
SeaBIOS (version 1.8.2-google)
Total RAM Size = 0x0000000100000000 = 4096 MiB
CPUs found: 2 Max CPUs supported: 2
found virtio-scsi at 0:3
virtio-scsi vendor='Google' product='PersistentDisk' rev='1' type=0 removable=0
virtio-scsi blksize=512 sectors=20971520 = 10240 MiB
drive 0x000f22f0: PCHS=0/0/0 translation=lba LCHS=1024/255/63 s=20971520
Sending Seabios boot VM event.
Booting from Hard Disk 0...
So the system is not reachable. I Tryed to check the UUID of the drive but it seems to be the right one. Can someone tell me how to fix this.
Best regards
Alex
What you're experiencing looks very much like a bug. You may try to stop the VM, take a snapshot and try creating new one.
My recommendation would be to contact GCP support to get more immediate help (however it's a paid service) or open up a new issue at Google IssueTracker to get help for free but there are not ETA's for this.
Sometimes the starting time of the instance takes more than 5 minutes. In this case, the Status Checks takes more than 4 minutes.
How can I make the instance run less than a minute, including checking the status?
You do not need to wait for the Instance Status Check to complete before using an Amazon EC2 instance.
Linux instances are frequently ready 60-90 seconds after launch. Windows instances take considerably longer because the AMI has been configured for sysprep, which involves a reboot.
New instances take longer to be ready than existing instances because they typically run code on first startup. So, if you Stop and instance and later Start it again, the instance will be available quite quickly (especially Linux instances).
I'm not sure that "You do not need to wait for the Instance Status Check to complete" is
correct, and if the status check failed for any reason you (obviously) have a problem and should investigate before using.
Doing a quick check using a aws jdk script creating a "Nano" instance from a linux image loaded with ubuntu, apache, tomcat, java, mysql etc it took 45 secs to get "running" and 2 mins 15 secs to finish the Status Checks.
Starting an existing "stopped" instance ("Nano") took 18 secs and 2 mins 15 secs to finish the status checks.
You can't change instance health check it manages by aws. When a system status check fails, you can choose to wait for AWS to fix the issue, or you can resolve it yourself by stop and start the instance. which in most cases migrates it to a new host computer.
The following are examples of problems that can cause system status checks to fail:
Loss of network connectivity, Loss of system power
Software issues on the physical host
Hardware issues on the physical host that impact network reachability.
The instance will be accessible once boot. it should not take 5 min of time. you can check instance boot logs or screen from
ec2 --> Action --> Instance settings`Get system log` and `Get instance screenshot` and optimized instance up time.
I have a large web-based application running in AWS with numerous EC2 instances. Occasionally -- about twice or thrice per week -- I receive an alarm notification from my Sensu monitoring system notifying me that one of my instances has hit 100% CPU.
This is the notification:
CheckCPU TOTAL WARNING: total=100.0 user=0.0 nice=0.0 system=0.0 idle=25.0 iowait=100.0 irq=0.0 softirq=0.0 steal=0.0 guest=0.0
Host: my_host_name
Timestamp: 2016-09-28 13:38:57 +0000
Address: XX.XX.XX.XX
Check Name: check-cpu-usage
Command: /etc/sensu/plugins/check-cpu.rb -w 70 -c 90
Status: 1
Occurrences: 1
This seems to be a momentary occurrence and the CPU goes back down to normal levels within seconds. So it seems like something not to get too worried about. But I'm still curious why it is happening. Notice that the CPU is taken up with the 100% IOWaits.
FYI, Amazon's monitoring system doesn't notice this blip. See the images below showing the CPU & IOlevels at 13:38
Interestingly, AWS says tells me that this instance will be retired soon. Might that be the two be related?
AWS is only displaying a 5 minute period, and it looks like your CPU check is set to send alarms after a single occurrence. If your CPU check's interval is less than 5 minutes, the AWS console may be rolling up the average to mask the actual CPU spike.
I'd recommend narrowing down the AWS monitoring console to a smaller period to see if you see the spike there.
I would add this as comment, but I have no reputation to do so.
I have noticed my ec2 instances have been doing this, but for far longer and after apt-get update + upgrade.
I tough it was an Apache thing, then started using Nginx in a new instance to test, and it just did it, run apt-get a few hours ago, then came back to find the instance using full cpu - for hours! Good thing it is just a test machine, but I wonder what is wrong with ubuntu/apt-get that might have cause this. From now on I guess I will have to reboot the machine after apt-get as it seems to be the only way to put it back to normal.
We are running into a memory issues on our RDS PostgreSQL instance i. e. Memory usage of the postgresql server reaches almost 100% resulting in stalled queries, and subsequent downtime of production app.
The memory usage of the RDS instance doesn't go up gradually, but suddenly within a period of 30min to 2hrs
Most of the time this happens, we see that lot of traffic from bots is going on, though there is no specific pattern in terms of frequency. This could happen after 1 week to 1 month of the previous occurence.
Disconnecting all clients, and then restarting the application also doesn't help, as the memory usage again goes up very rapidly.
Running "Full Vaccum" is the only solution we have found that resolves the issue when it occurs.
What we have tried so far
Periodic vacuuming (not full vacuuming) of some tables that get frequent updates.
Stopped storing Web sessions in DB as they are highly volatile and result in lot of dead tuples.
Both these haven't helped.
We have considered using tools like pgcompact / pg_repack as they don't acquire exclusive lock. However these can't be used with RDS.
We now see a strong possibility that this has to do with memory bloat that can happen on postgresql with prepared statements in rails 4, as discussed in following pages:
Memory leaks on postgresql server after upgrade to Rails 4
https://github.com/rails/rails/issues/14645
As a quick trial, we have now disabled prepared statements in our rails database configuration, and are observing the system. If the issue re-occurs, this hypothesis would be proven wrong.
Setup details:
We run our production environment inside Amazon Elastic Beanstalk, with following configuration:
App servers
OS : 64bit Amazon Linux 2016.03 v2.1.0 running Ruby 2.1 (Puma)
Instance type: r3.xlarge
Root volume size: 100 GiB
Number of app servers : 2
Rails workers running on each server : 4
Max number of threads in each worker : 8
Database pool size : 50 (applicable for each worker)
Database (RDS) Details:
PostgreSQL Version: PostgreSQL 9.3.10
RDS Instance type: db.m4.2xlarge
Rails Version: 4.2.5
Current size on disk: 2.2GB
Number of tables: 94
The environment is monitored with AWS cloudwatch and NewRelic.
Periodic vacuum should help in containing table bloat but not index bloat.
1)Have you tried more aggressive parameters of auto-vacuum ?
2)Tried routine reindexing ? If locking is a concern then consider
DROP INDEX CONCURRENTLY ...
CREATE INDEX CONCURRENTLY ...