How to handle GCP preemptive shutdown for jupyter notebooks - google-cloud-platform

In preemptive VM instances in Google Cloud Platform, a forced shut down can be called at any time. They allow to run a shutdown script to avoid file loss. But how do I use the script to cause a specific interrupt in my jupyter notebook?

I have come up with a solution.
from os import getpid, kill
from time import sleep
import signal
import ipykernel
import psutil
def get_active_kernels():
active_kernels = []
pids = psutil.pids()
my_pid = getpid()
for pid in pids:
if pid == my_pid:
continue
try:
p = psutil.Process(pid)
cmd = p.cmdline()
for arg in cmd:
if arg.count('ipykernel'):
active_kernels.append(pid)
except psutil.AccessDenied:
continue
return active_kernels
if __name__ == '__main__':
kernels = get_active_kernels()
for kernel in kernels:
kill(kernel, signal.SIGINT)
One can use this code as a shut-down script. It invokes a keyboard interrupt to all existing jupyter kernels. So, a simple try-except block that excepts KeyboardInterrupt can be used inside the jupyter notebook.
try:
... #regular code
except KeyboardInterrupt:
... #code to save the progress

Jupyter notebooks work with an instance VM on its backend and you can access them through ssh protocol like the other instances from the compute engine. This means that any script which works in a computer engine instance must work with a jupyter notebook.
In your description I understand you are referring to this shutdown script. This scripts saves a checkpoint while your instance is being shutdown, so it doesn't trigger the shutdown command itself.
There are many ways to shutdown an instance, either from inside the instance (script) as from outside (cloud shell, console UI ...).
Could you explain which is your specific purpose so I can help you further?

Related

capture jupyter-notebook stdout with subprocess

I'm working on creating a tool that allows users to run a jupyter-notebook w/ pyspark on an AWS server and forward the port to their localhost to connect to the notebook.
I've been using subprocess.Popen to ssh into the remote server and kick off the pyspark shell/notebook, but I'm unable to avoid having it print everything to the terminal. I WANT to perform an action per line to retrieve the port number.
For example, running this (following the most popular answer here: Read streaming input from subprocess.communicate())
command = "jupyter-notebook"
con = subprocess.Popen(['ssh', node, command], stdout=subprocess.PIPE, bufsize=1)
with con.stdout:
for line in iter(con.stdout.readline, b''):
print(line),
con.wait()
this ignores the context manager, and the con portion starts printing off stdout so that this is immediately printed to terminal
[I 16:13:20.783 NotebookApp] [nb_conda_kernels] enabled, 0 kernels found
[I 16:13:21.031 NotebookApp] JupyterLab extension loaded from /home/*****/miniconda3/envs/aws/lib/python3.7/site-packages/jupyterlab
[I 16:13:21.031 NotebookApp] JupyterLab application directory is /data/data0/home/*****/miniconda3/envs/aws/share/jupyter/lab
[I 16:13:21.035 NotebookApp] [nb_conda] enabled
...
...
...
I can get the context manager to function when I call a random script like the below instead of "jupyter-notebook" (where command="bash random_script.sh")
# random_script.sh
for i in $(seq 1 100)
do
echo "some output: $i"
sleep 2
done
This acts as expected, and I can actually perform an action per line within the with statement. Is there something fundamentally different about the jupyter version that prevents this from acting similarly?
The issue turned out to have everything to do with the fact that the console output produced by jupyter was actually going to STDERR instead of stdout. I'm not sure why. But regardless, this change totally fixed the issue:
con = subprocess.Popen(['ssh', node, command],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT, # <-- redirect stderr to stdout
bufsize=1)

Running Python script on AWS EC2

Apologies if it is repeat but i couldn't find anything worthwhile to accomplish my task.
I have an instance and i have figured out starting and stopping it using boto3 and it works but the real problem is running the script when instance is up. I would like to wait for script to finish and then stop the instance.
python /home/ubuntu/MyProject/TechInd/EuropeRun.py &
python /home/ubuntu/FTDataCrawlerEU/EuropeRun.py &
Reading quite a few post leads to the direction of Lambda and AWS Beanstalk but those don't appear simple.
Any suggestion is greatly appreciated.
Regards
DC
You can use the following code.
import boto3
import botocore
import os
from termcolor import colored
import paramiko
def stop_instance(instance_id, region_name):
client = boto3.client('ec2', region_name=region_name)
while True:
try:
client.stop_instances(
InstanceIds=[
instance_id,
],
Force=False
)
except Exception, e:
print e
else:
break
# Waiter to wait till instance is stopped
waiter = client.get_waiter('instance_stopped')
try:
waiter.wait(
InstanceIds=[
instance_id,
]
)
except Exception, e:
print e
def ssh_connect(public_ip, cmd):
# Join the paths using directory name and file name, to avoid OS conflicts
key_path = os.path.join('path_to_aws_pem', 'file_name.pem')
key = paramiko.RSAKey.from_private_key_file(key_path)
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
# Connect/ssh to an instance
while True:
try:
client.connect(hostname=public_ip, username="ubuntu", pkey=key)
# Execute a command after connecting/ssh to an instance
stdin, stdout, stderr = client.exec_command(cmd)
print stdout.read()
# close the client connection once the job is done
client.close()
break
except Exception, e:
print e
# Main/Other module where you're doing other jobs:
# Get the public IP address of EC2 instance, I assume you should have handle to the ec2 instance already
# You can use any alternate method to fetch/get public ip of your ec2 instance
public_ip = ec2_instance.public_ip_address
# Get the instance ID of EC2 instance, I assume you should have handle to the ec2 instance already
instance_id = ec2_instance.instance_id
# Command to Run/Execute python scripts
cmd = "nohup python /home/ubuntu/MyProject/TechInd/EuropeRun.py & python /home/ubuntu/FTDataCrawlerEU/EuropeRun.py &"
ssh_connect(public_ip, cmd)
print colored('Script execution finished !!!', 'green')
# Shut down/Stop the instance
stop_instance(instance_id, region_name)
You can execute your shutdown command through python code once your script is done.
An example of using ls
from subprocess import call
call(["ls", "-l"])
But for something that simple lambda is much easier and resource efficient. You only need to upload your script to s3 and the execute the lambda function through boto3.
Actually you can just copy paste your script code to the lambda console if you don't have any dependencies.
Some options for running the script automatically at system startup:
Call the script via the EC2 User-Data
Configure the AMI to start the script on boot via an init.d script, or an #reboot cron job.
To shutdown the instance after the script is complete, add some code at the end of the script to either initiate a OS shutdown, or call the AWS API (via Boto) to shutdown the instance.

Periodic tasks in Django on Elastic Beanstalk (possibly with celery beat)

I'm trying to set up a daily task for my Django application on Elastic Beanstalk. There doesn't appear to be an accepted way to set this up, as celery beat is the go-to solution for periodic tasks in Django, but isn't great for load-balanced environments.
I've seen some solutions doing things like setting up celery beat with leader_only=True, to only run one instance, but that leaves a single point of failure. I've seen other solutions that allow many instances of celery beat and use locks to make sure only one task goes through, but wouldn't this still eventually fail completely unless the failed instances were restarted? Another suggestion I've seen is to have a separate instance for running celery beat, but this would still be a problem unless it had some way of restarting itself if it failed.
Are there any decent solutions to this problem? I would much rather not have to babysit a scheduler, as it would be pretty easy to not notice that my task was not being run until a while later.
If you're using redis as your broker, look into installing RedBeat as the celery beat scheduler: https://github.com/sibson/redbeat
This scheduler uses locking in redis to make sure only a single beat instance is running. With this you can enable beat on each node's worker process and remove the use of leader_only=True.
celery worker -B -S redbeat.RedBeatScheduler
Let's say you have Worker A with beat lock and Worker B. If Worker A dies, Worker B will attempt to acquire the beat lock after a configurable amount of time.
I would suggest making a management command that runs with cron.
Using this method, you have your full Django ORM, all methods, etc. to work with. Wrapping your script in a try/except, you have the option to log failures in any way that you wish - email notifications, external logging systems like Sentry, straight to the DB, etc.
I user supervisord to run cron and it works well. It relies on time-tested tools that won't let you down.
Finally, using a database singleton to keep track of if a batch job has been run or is currently running in an environment where you have multiple instances of Django running load-balanced isn't bad practice, even if you feel a little icky about it. The DB is a very reliable means of telling you if the DB is being processed.
The one annoying thing about cron is that it doesn't import environment variables you may need for Django. I solved this with a simple Python script.
It writes the crontab on startup with needed environment variables etc. included. This example is for Ubuntu on EBS but should be relevant.
#!/usr/bin/env python
# run-cron.py
# sets environment variable crontab fragments and runs cron
import os
from subprocess import call
from master.settings import IS_AWS
# read django's needed environment variables and set them in the appropriate crontab fragment
eRDS_HOSTNAME = os.environ["RDS_HOSTNAME"]
eRDS_DB_NAME = os.environ["RDS_DB_NAME"]
eRDS_PASSWORD = os.environ["RDS_PASSWORD"]
eRDS_USERNAME = os.environ["RDS_USERNAME"]
try:
eAWS_STAGING = os.environ["AWS_STAGING"]
except KeyError:
eAWS_STAGING = None
try:
eAWS_PRODUCTION = os.environ["AWS_PRODUCTION"]
except KeyError:
eAWS_PRODUCTION = None
eRDS_PORT = os.environ["RDS_PORT"]
if IS_AWS:
fto = '/etc/cron.d/stortrac-cron'
else:
fto = 'test_cron_file'
with open(fto,'w+') as file:
file.write('# Auto-generated cron tab that imports needed variables and runs a python script')
file.write('\nRDS_HOSTNAME=')
file.write(eRDS_HOSTNAME)
file.write('\nRDS_DB_NAME=')
file.write(eRDS_DB_NAME)
file.write('\nRDS_PASSWORD=')
file.write(eRDS_PASSWORD)
file.write('\nRDS_USERNAME=')
file.write(eRDS_USERNAME)
file.write('\nRDS_PORT=')
file.write(eRDS_PORT)
if eAWS_STAGING is not None:
file.write('\nAWS_STAGING=')
file.write(eAWS_STAGING)
if eAWS_PRODUCTION is not None:
file.write('\nAWS_PRODUCTION=')
file.write(eAWS_PRODUCTION)
file.write('\n')
# Process queue of gobs
file.write('\n*/8 * * * * root python /code/app/manage.py queue --process-queue')
# Every 5 minutes, double-check thing is done
file.write('\n*/5 * * * * root python /code/app/manage.py thing --done')
# Every 4 hours, do this
file.write('\n8 */4 * * * root python /code/app/manage.py process_this')
# etc.
file.write('\n3 */4 * * * root python /ode/app/manage.py etc --silent')
file.write('\n\n')
if IS_AWS:
args = ["cron","-f"]
call(args)
And in supervisord.conf:
[program:cron]
command = python /my/directory/runcron.py
autostart = true
autorestart = false

How to launch an external program in a python script

I am creating a Python script where it does a bunch of tasks and one of those tasks is to launch and open google chrome. What is the ideal way of accomplishing that in my script?
You can launch anything from python as a command line call. This is done with Subprocess.
For example:
import subprocess
sp = subprocess.Popen("PATH_TO_CHROME\chrome")
#If you want your process to wait for chrome to "complete"
ret_val = sp.wait()
#or
sp_out, sp_error = sp.communicate()
Maybe you can try with:
Subprocess.call
import subprocess
subprocess.call(['C:\\Users\username\AppData\Local\Google\Chrome\Application\chrome.exe'])
Maybe with commands.
import commands
a = commands.getoutput("./PATH_TO_CHROME/chrome")

How to make fabric flush the it's output?

I'm using fabric to run ssh tasks on remote machines.
The output isn't flushed automatically, is there a method to force auto flushing?
(the documentation doesn't appear to mention this subject)
When using Fabric's puts() to output some text, you can use the flush=True parameter to avoid buffering:
puts('Doing stuff', flush=True)
Or if you're concerned about the output from a remote command, you may want to flush the standard output after running the command:
run('some command')
sys.stdout.flush()
Note that some buffering may still occur in Fabric during execution of the command itself (not sure about it though), or within the remote command itself. In that case, you should see the same behavior when running it through Fabric or directly via SSH.
I'm slightly biased because I work there but we came up with a logging utility for fabric named gusset at my workplace. It's named gusset and it allows you to have configurable logging with your fabric scripts
from fabric.api import run
from gusset.output import with_output
#with_output(verbosity=1)
def foo():
run("ls")
In [9]: with settings(host_string="mybox", user="myuser"):
...: foo()
...:
[mybox] run: ls