How to save the state of the EC2 instances upon connection loss from local machine? - amazon-web-services

I need to train deep learning model in AWS EC2 instances. I can connect with the instances through ssh connection. After establishing connection, I run training among the instances. If my wifi goes down then i lose connection with the instances, as it shows "Connection broken pipes". So i need to again establish ssh connection, it's like restarting the instances again.
How can i save the state of the running instances so that after reconnection, i can get the previous state?

Running command in background is one of the way to tackle this situation.
You can run commands using 'nohup' and it will continue even if your ssh disconnects.

Related

What can I do to solve Google Cloud vm instance network problem

The network is working before and I have not change anything on vm. After few months, I can not access the vm instance.
The vm instance is running
I will get "Request timed out" when ping to external network ip address.
I can not access SSH. The SSH port was open properly.
When troubleshooting my connection status of SSH in browser, it is stuck on Network status.
What should I do to know the reason of problem? After I restart the vm instance few times, it will running normally for a period, but the problem will appear again.
Any idea to make sure the vm instance will not disconnect from external network with this reason again?
Here are the resource consuming of my vm
In this case, VESTACP minimum system requirements for VM instances should be okay. But you can also consider the workload process for your VM instance.
I recommend switching to a higher N1 machine types to provide good performance for the workload and machine requirements.

Getting "Connection refused" when running nodetool on Cassandra cluster on AWS EC2

I just did the setup of a cassandra cluster.
I have changed the seed, listen_adress and broadcast_adress in the cassandra.yalm file. But when I run the command
$ nodetool flush system
Cmd retrun this error,
nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException:
'Connection refused (Connection refused)'.
in the file etc/cassandra/cassandra-env.sh I made the modification JVM_OPTS as on the screenshot
I use aws at the beginning I was on a t2.micro server. But I switched to a t2.large as recommended in many articles.
Finally, my ports are open as shown in this screenshot and I'm use ubuntu.
By default, remote JMX connections to Cassandra nodes are disabled. If you're running nodetool commands on the EC2 instance itself, it isn't necessary to modify the JVM options cassandra-env.sh.
In fact, we discourage allowing remote JMX connections for security reasons. Only allow remote access if you're an expert and know what you're doing. Cheers!

AWS Connection timeout + EC2 Instance Connect not working

I tried to connect to a running ec2 instance with my usual settings, it returns
ssh: connect to host ec2 port 22: Connection timed out
I tried to connect with the built-in "EC2 Instance Connect", to connect directly from the browser with the AWS account, it returns
There was a problem setting up the instance connection An error
occurred and we were unable to connect or stay connected to your
instance. If this instance has just started up, try again in a minute
or two.
The instance was running for weeks, I am the only user with access to the AWS account and the SSH Keys and I didn t change any setting in the last ~3 weeks or restarted it
1st the timeout started ~1 week ago, nand then without any other change, my website (wordpress) suddenly started to show a database connection error (the database in inside the EC2 instance as well)
What I used to connect :
Either
ssh -i "Keys.pem" ec2-user#ec2-[public ip].eu-west-3.compute.amazonaws.com
Or
ssh ec2-user#[public ip] -i "Keys.pem"
Both show the same error. I used the first one several weeks ago and it used to work well
This timeout will be caused by invalid security group rules.
Ensure that the security group rules attached to your instance allow inbound access from the source IP address you're trying to SSH from, the database connection may also be related to this.
If you're connecting using a dynamic public IP address to SSH to your host, you will need to adjust this every time your IP address changes. It might be more appropriate to setup a VPN so that you can connect privately to your host.

AWS EC2 Instance connection reset by port 22

I have a aws ec2 p3.2xlarge instance. I can ssh and connect to it easily. However about after 20 minutes, while I am running a keras model on it, it resets the connection and I am kicked out with the error Connection reset by 54.161.50.138 port 22. I then am able to reconnect, but have to start training the model over again because my progress was lost. This happens every time I connect to the instance. Any idea why this is happening?
For ssh I am using gow which lets me run linux commands on windows - https://github.com/bmatzelle/gow/wiki
I checked my public ip address before and after the reset and it was the same.
I also looked at the cpu usage using amazon CloudWatch, and it was normal - 20%.
I figured out a partial solution to this. In the instance terminal follow the following steps.
run the command "tmux"
in the new shell that pops up, execute the job
detach from the tmux shell by using the shortcut (Ctrl+b then d)
if the ssh connection resets, ssh to the instance again and run "tmux attach"
the job should have kept on running and you can resume where you left off

I can ping my EC2 instance, but I cannot connect through ssh

A while back I had created an RHEL EC2 instance. Set it up correctly and was able to connect to it through putty and WINSCP. Over time it hasn't been used but until recently it needed to be accessed again. I went to check to login but wasn't able to. So i reboot the instance and try to reconnect but I cannot anymore. I get the error "Network error: Connection refused."
I tried recreating the ppk from pem, and also enable all ports to all IP's. What could have caused this un-reachability and are there any troubleshooting tips for me to connect to it again?
There are a few things to check here:
Did you have anything running on the box that might have caused it to become unresponsive over time? This is somewhat unlikely since you said you rebooted the machine.
Check your security group settings to ensure that the firewall is not blocking your SSH port. The instance has no way of knowing whether connections will actually be accepted by the Amazon network on the SSH listening port.
Amazon hardware can fail and cause your instance to become unresponsive. Go to the Instances page on your EC2 console and see if 2/2 of the status checks are passing. If less than 2 are passing, this is probably a failed instance situation.
As a last resort, try right-clicking the instance and checking the system log for anything that might have caused the instance to not listen for SSH connections.
Hopefully you have your data on an EBS volume such that you can simply stop and start the instance and have it come up on different hardware. While it would be nice if Amazon provided console level access to the box, unfortunately they do not presently (as far as I know).