AWS EKS ec2 instance reachability check failed - amazon-web-services

AWS EKS ec2 instance reachability check failed, knowing that it was working fine yesterday before I shutdown the instance, I can't connect to the instance using mobaXterm nor putty now
some of the system log shows a fail to start Login Service as the following:
Starting Dump dmesg to /var/log/dmesg...
[[32m OK [0m] Started libstoragemgmt plug-in server daemon.
Starting ACPI Event Daemon...
[[32m OK [0m] Started Monero miner service.
[[32m OK [0m] Started RPC bind service.
[[32m OK [0m] Started Resets System Activity Logs.
[[32m OK [0m] Started Hardware RNG Entropy Gatherer Wake threshold service.
[[1;31mFAILED[0m] Failed to start Login Service.
See 'systemctl status systemd-logind.service' for details.
[[32m OK [0m] Started NTP client/server.
[[32m OK [0m] Started ACPI Event Daemon.
[[32m OK [0m] Started Dump dmesg to /var/log/dmesg.
I tried to change the instance type from micro to small to increase the CPU and ram but it didn't solve the issue, also I tried to reboot, stop and start the instance but it doesn't work

In the EC2 instance, it seems that there is a problem with the systemd-logind service. Among other things, this service is in charge of managing user login and logout.
To view the system logs and file system, you can try stopping the instance, detaching the root EBS volume, attaching it to a separate EC2 instance, and then mounting it on that instance.
Once you have identified any issues and fixed them, you can unmount the volume, detach it from the temporary instance, and re-attach it to the problematic instance. Then, start the instance and try to connect to it again.

The system-logind failure can explain the impossibility of login, but the reachability check fail is symptom of something else.
First, follow the steps on the official guide to troubleshoot the systems check.
You can also try to use EC2Rescue, mounting the root volume on a separate instance: this should fix every issue on the volume.

Related

Getting "Connection refused" when running nodetool on Cassandra cluster on AWS EC2

I just did the setup of a cassandra cluster.
I have changed the seed, listen_adress and broadcast_adress in the cassandra.yalm file. But when I run the command
$ nodetool flush system
Cmd retrun this error,
nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException:
'Connection refused (Connection refused)'.
in the file etc/cassandra/cassandra-env.sh I made the modification JVM_OPTS as on the screenshot
I use aws at the beginning I was on a t2.micro server. But I switched to a t2.large as recommended in many articles.
Finally, my ports are open as shown in this screenshot and I'm use ubuntu.
By default, remote JMX connections to Cassandra nodes are disabled. If you're running nodetool commands on the EC2 instance itself, it isn't necessary to modify the JVM options cassandra-env.sh.
In fact, we discourage allowing remote JMX connections for security reasons. Only allow remote access if you're an expert and know what you're doing. Cheers!

Cannot access GCP VM instance

I've been trying to connect to a VM instance for the past couple of days now. Here's what I've tried:
Trying to SSH into it returns username#ipaddress: Permission denied (publickey).
Using the Google Cloud SDK returns this:
No zone specified. Using zone [us-central1-a] for instance: [instancename].
Updating project ssh metadata...done.
Waiting for SSH key to propagate.
SFATAL ERROR: No supported authentication methods available (server sent: publickey)
ERROR: (gcloud.compute.ssh) Could not SSH into the instance. It is possible that your SSH key has not propagated to theinstance yet. Try running this command again. If you still cannot connect, verify that the firewall and instance are set to accept ssh traffic.
Using the browser SSH just gets stuck on "Transferring SSH keys to the VM."
Using PuTTy also results in No supported authentication methods available (server sent: publickey)
I checked the serial console and found this:
systemd-hostnamed.service: Failed to run 'start' task: No space left on device
I did recently resize the disk and did restart the VM, but this error still occurs.
Access to port 22 is allowed in the firewall rules. What can I do to fix this?
After increasing the disk size you need to reboot the instance so the filesystem can be resized, just in this specific case because you already ran out of space.
If you have not already done so, create a snapshot of the VM's boot disk.
Try to restart the VM.
If you still can't access the VM, do the following:
Stop the VM:
gcloud compute instances stop VM_NAME
Replace VM_NAME with the name of your VM.
Increase the size the boot disk:
gcloud compute disks resize BOOT_DISK_NAME --size DISK_SIZE
Replace the following:
BOOT_DISK_NAME: the name of your VM's boot disk
DISK_SIZE: the new larger size, in gigabytes, for the boot disk
Start the VM:
gcloud compute instances start VM_NAME
Reattempt to SSH to the VM.

How to save the state of the EC2 instances upon connection loss from local machine?

I need to train deep learning model in AWS EC2 instances. I can connect with the instances through ssh connection. After establishing connection, I run training among the instances. If my wifi goes down then i lose connection with the instances, as it shows "Connection broken pipes". So i need to again establish ssh connection, it's like restarting the instances again.
How can i save the state of the running instances so that after reconnection, i can get the previous state?
Running command in background is one of the way to tackle this situation.
You can run commands using 'nohup' and it will continue even if your ssh disconnects.

AWS EC2 Instance connection reset by port 22

I have a aws ec2 p3.2xlarge instance. I can ssh and connect to it easily. However about after 20 minutes, while I am running a keras model on it, it resets the connection and I am kicked out with the error Connection reset by 54.161.50.138 port 22. I then am able to reconnect, but have to start training the model over again because my progress was lost. This happens every time I connect to the instance. Any idea why this is happening?
For ssh I am using gow which lets me run linux commands on windows - https://github.com/bmatzelle/gow/wiki
I checked my public ip address before and after the reset and it was the same.
I also looked at the cpu usage using amazon CloudWatch, and it was normal - 20%.
I figured out a partial solution to this. In the instance terminal follow the following steps.
run the command "tmux"
in the new shell that pops up, execute the job
detach from the tmux shell by using the shortcut (Ctrl+b then d)
if the ssh connection resets, ssh to the instance again and run "tmux attach"
the job should have kept on running and you can resume where you left off

I can ping my EC2 instance, but I cannot connect through ssh

A while back I had created an RHEL EC2 instance. Set it up correctly and was able to connect to it through putty and WINSCP. Over time it hasn't been used but until recently it needed to be accessed again. I went to check to login but wasn't able to. So i reboot the instance and try to reconnect but I cannot anymore. I get the error "Network error: Connection refused."
I tried recreating the ppk from pem, and also enable all ports to all IP's. What could have caused this un-reachability and are there any troubleshooting tips for me to connect to it again?
There are a few things to check here:
Did you have anything running on the box that might have caused it to become unresponsive over time? This is somewhat unlikely since you said you rebooted the machine.
Check your security group settings to ensure that the firewall is not blocking your SSH port. The instance has no way of knowing whether connections will actually be accepted by the Amazon network on the SSH listening port.
Amazon hardware can fail and cause your instance to become unresponsive. Go to the Instances page on your EC2 console and see if 2/2 of the status checks are passing. If less than 2 are passing, this is probably a failed instance situation.
As a last resort, try right-clicking the instance and checking the system log for anything that might have caused the instance to not listen for SSH connections.
Hopefully you have your data on an EBS volume such that you can simply stop and start the instance and have it come up on different hardware. While it would be nice if Amazon provided console level access to the box, unfortunately they do not presently (as far as I know).