Possible to keep the GCloud VM Instances running without connection? - google-cloud-platform

The title explains the question itself. My problem is every time I connect my VM machine through SSH it always timeouts after a period of time. So I'd like to let my Python script work on itself for like hours or days. Any advice? Thanks.

VM Instance will keep running even if your SSH times out.
You can keep the SSH session alive by adding following lines:
Host remotehost
HostName remotehost.com
ServerAliveInterval 240
to $HOME/.ssh/config file.
There's a similar option in PuTTy.
To keep process alive after disconnecting, you have multiple options, including those already suggested in commnets:
nohup
screen
setsid
cron
service/daemon
Decision which one to choose depends on specifics of the task that is being performed by the script.

Related

Using plink commands from Rstudio with the AWS EC2 has disabled the instance DNS

I was trying to process GWAS data using plink1.9 with a rental AWS ubuntu server. I executed the plink commands from the terminal window in the Rstudio server.
It turned out that if I execute a plink command that overloads the server, my Rstudio server will become inaccessible and this problem does not revolve.
for example my Rstudio-server from port 8787 has become unavailable.
http://ec2-54-64-41-xxx.ap-northeast-1.compute.amazonaws.com:8787/
I accidentally did it twice. First time I did something like cat xxx.vcf (how stupid of me) and the server simply went frozen and the Rstudio server crashed.
Since I could still access the server with putty and winscp and so on I managed to get my files to a new instance. Then I tried to use plink to do some QC, something like
./plink --bfile xxx--mind 1 --geno 0.01 --maf 0.05 --make-bed --out yyy
It had again overloaded the server and the same Rstudio server trouble occurred again.
Now both instances are still accessible from putty, I logged on to check the running processes and it seemed to be fine. There were no active heavy jobs and no zombie processes either.
The CPU monitoring looks fine too.
With the only problem that the Rstudio-server link is not working.
Does anyone have similar experiences? Your advice is very much appreciated.
mindy

Can't keep SSH connection to VM using gcloud-sdk

I have a google cloud Deep Learning Virtual Machine Image for PyTorch that uses an SSH connection to connect to the Jupyter Notebook on it. How can I change what I am currently doing so that the Jupyter Notebook remains alive even when I close my laptop/temporarily disconnect from internet?
Currently after turning my VM and opening a tmux window I start up the Jupyter Notebook and its SSH connection with this command:
gcloud compute ssh <my-server-name> -- -L 8080:localhost:8080
This code is taken from the official docs for the deep learning images here: https://cloud.google.com/deep-learning-vm/docs/jupyter
I can then connect at localhost:8080 and do what I need to. However, if I start training a model for a long time and need to close my laptop, when I re-open it my ssh connection breaks, the Jupyter Notebook is turned off, and my model that is training is interrupted.
How can I keep this Juptyer Notebook live and be able to reconnect to it later?
NB. I used to use the Google Cloud browser SSH option and once in the server start a tmux window and the jupyter notebook within it. This worked great and meant the notebook was always alive. However, with the Google Cloud images that have CUDA and Jupyter preinstalled, this doesn't work and the only way I have been able to connect is through the above command.
I have faced this problem before on GCP too and found a simple way to resolve this. Once you have ssh'd into the compute engine, run the linux screen command and you will find yourself in a virtual terminal (you may open many terminals in parallel) and it is here you will want to run your long running job.
Once you have started the job, detach from the screen using the keys the keys Ctrl+a and then d. Once detached, you can exit out of the VM, reconnect to the VM and run screen -r and you will find that your job is still running.
Of course, you can do a lot of cool stuff with the screen command and would encourage you to read some of the tutorials found here.
NOTE: Please ensure that your Compute Engine instance is not a Pre-emptible machine!
Let me know if this helps!
I thinks it's better install Jupyter as server . so your job can keep running even when you disconnect.
There are something you might also want to know.
This is not the multi-user server you are looking for. This document describes how you can run a public server with a single user. This should only be done by someone who wants remote access to their personal machine. Even so, doing this requires a thorough understanding of the set-ups limitations and security implications. If you allow multiple users to access a notebook server as it is described in this document, their commands may collide, clobber and overwrite each other.
If you want a multi-user server, the official solution is JupyterHub. To use JupyterHub, you need a Unix server (typically Linux) running somewhere that is accessible to your users on a network. This may run over the public internet, but doing so introduces additional security concerns.

Is there any way to know which process is using memory in EC2 if I can't connect via ssh

last night I receive an error log (I use Rollbar) from my server with the message "NoMemoryError: failed to allocate memory"
When I was able to access my server, it took a lot, but I could connect by SSH. Sadly, every command I ran (free -m, top, ps, etc) I got "cannot fork: Cannot allocate memory".
Now I can't even access the server, I get "ssh_exchange_identification: read: Connection reset by peer"
This happened before and I just rebooted the machine, but now I want to know what is happening in order to prevent this to happen again. It's a m3.medium (with Ubuntu) and host a staging env, so I think it shouldn't have memory problems.
I wonder if is there any way, in the AWS Console, to see what is happening or free some memory in order to at least be able to connect via SSH.
Any ideas?
If you really have no idea what the problem is then write a script like this
#!/bin/bash
FILE=/var/log/memoryproblem.log
date +'%c' >> $FILE
free -m >> $FILE
ps axu |sort -rn -k 4,5|head >> $FILE
make cron run this at regular intervals
This will log quite a lot of information so clear up on a regular basis
Oh and another thing. There is one way of seeing log information on a host apart from ssh. In the aws console view of ec2 instances, select the instance and right click, instance settings -> system log may possibly be useful in this situation
Another thing to do is to temporarily increase the instance size. m3.medium only has 3.75GB of ram. If you up it to a m3.extralarge with 15GB of ram then it is possible the problem will occur and you can see what is going on due to the extra resource. Once you've fixed the issue you can go back to a smaller instance

Running a command on remote machine without ssh delay

I want to establish an API server with a load balancer. I will use one machine as the master which will assign tasks in a round robin manner to 2 slave machines. All machines are hosted in AWS.
In order to process the request, I need to run a python script. When the master receives the request, it can trigger the command on one of the slaves using ssh but this adds an extra couple of seconds delay to the processing time.
Is there a way to reduce/remove the ssh delay?
Not sure if you have something implemented or you just collect the thoughts.
The basic use case is described on wikibooks, but the easiest is to set up public key authentication and ssh_config (config for machine2 would be almost the same):
Host machine1
HostName machine1.example.org
ControlPath ~/.ssh/controlmasters/%r#%h:%p
ControlMaster auto
ControlPersist yes
IdentityFile ~/.ssh/id_rsa-something
And then call the remote script like this:
ssh machine1 ./remote_script.py
First ssh call will initiate the connection (and will take a bit longer), every other script call will use the existing connection and will be almost immediate.
If you are using Python, the similar behaviour you can achieve using paramiko or even ansible if you want to step one level up (but really depends on use case).

How to determine that an AWS EC2 instance is still initialising from a script

Is there a way to determine through a command line interface or other trick if an AWS EC2 instance is ready to receive ssh connections?
The running state seems not to be enough. Trying to connect in in the first minutes of the running state, the machine Status checks still shows initialising and ssh times out while trying to connect.
(I am using the awscli pip package.)
Running is similar to turning a computer on and finishing a bios check. As far as the hypervisor is concerned your instance is on.
The best way to know when your instance is ready, is to run a script at the end of startup (or when certain services are on) that will report its status to some other listener. Using that data, or event, you should know that your instance is ready to be connected to. This is purposely vague since there are so many different ways this can be accomplished.
You could also time the expected startup time, and try to connect after that and retry the connection if it fails. Still need a point at which you would stop trying as instances can fail to launch in some cases.