How to run AWS SageMaker lifecycle config scripts as a background job - amazon-web-services

I am trying to customize Amazon SageMaker Notebook Instances using Lifecycle Configurations because I need to install additional pip packages. What it means is I have to create a on-start.sh and on-create.sh script within a lifecycle configuration. You can see a sample here.
Now, I have many packages and the installation time might go over 5 minutes, causing a potential timeout. It is suggested to use nohup to run the script as a background job in that case.
But how do I run this with a nohup since I do not have a terminal in this case [see above screenshot]? Is there a way to run the script as a background job from within the script? Anything else I am missing? Please suggest

I have done this before, install many libraries for around 15 minutes. I wrapped the script I actually want to run in a create.sh and run that create.sh using nohup. Now the logs of these you can view on cloudwatch and also sagemaker start wont time out with a plus that you will have nohup.out file where you executed the nohup.
Below I wrapped script in https://github.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples/tree/master/scripts/export-to-pdf-enable into create.sh
#!/bin/bash
set -e
cat <<'EOF'>create.sh
#!/bin/bash
sudo -u ec2-user -i <<'EOF'
set -e
# OVERVIEW
# This script enables Jupyter to export a notebook directly to PDF.
# nbconvert depends on XeLaTeX and several LaTeX packages that are non-trivial to
# install because `tlmgr` is not included with the texlive packages provided by yum.
# REQUIREMENTS
# Internet access is required in on-create.sh in order to fetch the latex libraries from the ctan mirror.
sudo yum install -y texlive*
unset SUDO_UID
ln -s /home/ec2-user/SageMaker/.texmf /home/ec2-user/texmf
EOF
echo 'EOF' >> create.sh
nohup bash create.sh &

Related

Create custom kernel via post-startup script in Vertex AI User Managed notebook

I am trying to use a post-startup script to create a Vertex AI User Managed Notebook whose Jupyter Lab has a dedicated virtual environment and corresponding computing kernel when first launched. I have had success creating the instance and then, as a second manual step from within the Jupyter Lab > Terminal, running a bash script like so:
#!/bin/bash
cd /home/jupyter
mkdir -p env
cd env
python3 -m venv envName --system-site-packages
source envName/bin/activate
envName/bin/python3 -m pip install --upgrade pip
python -m ipykernel install --user --name=envName
pip3 install geemap --user
pip3 install earthengine-api --user
pip3 install ipyleaflet --user
pip3 install folium --user
pip3 install voila --user
pip3 install jupyterlab_widgets
deactivate
jupyter labextension install --no-build #jupyter-widgets/jupyterlab-manager jupyter-leaflet
jupyter lab build --dev-build=False --minimize=False
jupyter labextension enable #jupyter-widgets/jupyterlab-manager
However, I have not had luck using this code as a post-startup script (being supplied through the console creation tools, as opposed to command line, thus far). When I open Jupyter Lab and look at the relevant structures, I find that there is no environment or kernel. Could someone please provide a working example that accomplishes my aim, or otherwise describe the order of build steps that one would follow?
Post startup scripts run as root.
When you run:
python -m ipykernel install --user --name=envName
Notebook is using current user which is root vs when you use Terminal, which is running as jupyter user.
Option 1) Have 2 scripts:
Script A. Contents specified in original post. Example: gs://newsml-us-central1/so73649262.sh
Script B. Downloads script and execute it as jupyter. Example: gs://newsml-us-central1/so1.sh and use it as post-startup script.
#!/bin/bash
set -x
gsutil cp gs://newsml-us-central1/so73649262.sh /home/jupyter
chown jupyter /home/jupyter/so73649262.sh
chmod a+x /home/jupyter/so73649262.sh
su -c '/home/jupyter/so73649262.sh' jupyter
Option 2) Create a file in bash using EOF. Write the contents into a single file and execute it as mentioned above.
This is being posted as support context for the accepted solution from #gogasca.
#gogasca's suggestion (I'm using Option 1) works great, if you are patient. Through many attempts, I discovered that inconsistent behavior was based on timing of access. Using Option 1, the User Managed Notebook appears available for use in Vertex AI Workbench (green check and clickable "OPEN JUPYTERLAB" link) before the installation script(s) have finished.
If you open the Notebook too soon, you will find two things: (1) you will be prompted for a recommended Jupyter Lab build, for instance:
Build Recommended
JupyterLab build is suggested:
#jupyter-widgets/jupyterlab-manager changed from file:../extensions/jupyter-widgets-jupyterlab-manager-3.1.1.tgz to file:../extensions/jupyter-widgets-jupyterlab-manager-5.0.3.tgz
and (2) while the custom environment/kernel is present and accessible, if you try to use ipyleaflet or ipywidget tools, you will see one of several JavaScript errors, depending on how quickly you try to use the kernel, relative to the build that is (apparently) continuing to take place in the background: Error displaying widget: model not found, and/or a broken page icon with a JavaScript error, that, if clicked, will show you something like:
[Open Browser Console for more detailed log - Double click to close this message]
Failed to load model class 'LeafletMapModel' from module 'jupyter-leaflet'
Error: No version of module jupyter-leaflet is registered
at f.loadClass (https://someURL.notebooks.googleusercontent.com/lab/extensions/#jupyter-widgets/jupyterlab-manager/static/134.bcbea9feb6e7c4da7530.js?v=bcbea9feb6e7c4da7530:1:74856)
at f.loadModelClass (https://someURL.notebooks.googleusercontent.com/lab/extensions/#jupyter-widgets/jupyterlab-manager/static/150.3e1e5adfd821b9b96340.js?v=3e1e5adfd821b9b96340:1:10729)
at f._make_model (https://someURL.notebooks.googleusercontent.com/lab/extensions/#jupyter-widgets/jupyterlab-manager/static/150.3e1e5adfd821b9b96340.js?v=3e1e5adfd821b9b96340:1:7517)
at f.new_model (https://someURL.notebooks.googleusercontent.com/lab/extensions/#jupyter-widgets/jupyterlab-manager/static/150.3e1e5adfd821b9b96340.js?v=3e1e5adfd821b9b96340:1:5137)
at https://someURL.notebooks.googleusercontent.com/lab/extensions/#jupyter-widgets/jupyterlab-manager/static/150.3e1e5adfd821b9b96340.js?v=3e1e5adfd821b9b96340:1:6385
at Array.map ()
at f._loadFromKernel (https://someURL.notebooks.googleusercontent.com/lab/extensions/#jupyter-widgets/jupyterlab-manager/static/150.3e1e5adfd821b9b96340.js?v=3e1e5adfd821b9b96340:1:6278)
at async f.restoreWidgets (https://someURL.notebooks.googleusercontent.com/lab/extensions/#jupyter-widgets/jupyterlab-manager/static/134.bcbea9feb6e7c4da7530.js?v=bcbea9feb6e7c4da7530:1:77764)
The solution here is to keep waiting. In my demo script, I transfer a file at the end of the build process. If I wait long enough for this file to actually appear in the Instance directories, the recommendation for a rebuild is absent and the extensions work properly.

"RUN true" in dockerfile

When I took over a project, I found a command "RUN true" in the Dockerfile.
FROM xxx
RUN xxx
RUN true
RUN xxx
I don't know what this command does, can anyone help explain. In my opinion, this command makes no sense, but i'm not sure if there is any other use.
There is doc about Creating Images, you can see it:
RUN true \
&& dnf install -y --setopt=tsflags=nodocs \
httpd vim \
&& systemctl enable httpd \
&& dnf clean all \
&& true
#David Maze
test for it. docker file:
FROM centos:7.9.2009
RUN yum install tmux -y
RUN yum install not_exists -y
build log:
Sending build context to Docker daemon 2.048kB
Step 1/3 : FROM centos:7.9.2009
---> eeb6ee3f44bd
Step 2/3 : RUN yum install tmux -y
---> Running in 6c6e29ea9f2c
...omit...
Complete!
Removing intermediate container 6c6e29ea9f2c
---> 7c796c2b5260
Step 3/3 : RUN yum install not_exists -y
---> Running in e4b7096cc42b
...omit...
No package not_exists available.
Error: Nothing to do
The command '/bin/sh -c yum install not_exists -y' returned a non-zero code: 1
modify dockefile:
FROM centos:7.9.2009
RUN yum install tmux -y
RUN yum install tree -y
build log:
Sending build context to Docker daemon 2.048kB
Step 1/3 : FROM centos:7.9.2009
---> eeb6ee3f44bd
Step 2/3 : RUN yum install tmux -y
---> Using cache
---> 7c796c2b5260
Step 3/3 : RUN yum install tree -y
---> Running in 180b32cb44f3
...omit...
Installed:
tree.x86_64 0:1.6.0-10.el7
Complete!
Removing intermediate container 180b32cb44f3
---> 4e905ed25cc2
Successfully built 4e905ed25cc2
Successfully tagged test:v0
you can see Using cache 7c796c2b5260. without a command "RUN true", but the first "RUN" cache is reusged.
RUN true as a standalone command does absolutely nothing and it's safe to delete it.
/bin/true is a standard shell command. It reads no input, produces no output, and neither reads nor writes files; it just exits with a status code of 0 ("success"). Running it as a Docker step will have no effect on the final image other than inserting an additional layer into the docker history.
The one clever use I can think of for this is to cause a later part of a Dockerfile to re-run. Imagine a Dockerfile like
RUN some_expensive_command http://server-a.example.com/input1
RUN another_expensive_command http://server-b.example.com/input2
If the second input changes, you could want to rebuild this image. docker build --no-cache will re-run the first step too, though, and this could take longer than you want. Inserting a RUN true line between the two lines would break Docker's layer caching, but only after the first command has run.
# identical RUN line as before, from cache
RUN some_expensive_command http://server-a.example.com/input1
# not the same RUN line, so "executes" (but does nothing)
RUN true
# not running commands from cache any more
RUN another_expensive_command http://server-b.example.com/input2
I found an already existing answer which explains it quite well.
And if I quote the answer here:
Running and thus creating a new container even if it terminates still keeps the resulting container image and metadata lying around which can still be linked to.
So when you run docker run ... /bin/true you are essentially creating a new container for storage purposes and running the simplest thing you can.
In Docker 1.5 was introduced the docker create command so I believe you can now "create" containers without confusingly running something like /bin/true
And I found an quick explanation from the best practices github page under section '#chaining-commands' saying:
The first and last commands of the block are special.
If you would like to prepend or append a one-line command to the block, you will have to edit two lines - one that you are adding and the first or last commands. The first command is on the same line as the RUN directive, whereas the last command lacks the trailing backslash.
Editing a line with a command that that you don’t want to change presents a risk of introducing a bug, and you also obscure the line’s history. This can be mitigated by having both the first and last commands true - they don’t do anything.

Unable to install AWS SAM Cli on Mac

I am trying to install AWS SAM Cli on my Mac because I am trying to learn the AWS services. But I have installed the AWS cli successful using bundle. But when I tried to install the AWS SAM Cli as well. But it is not working. This is what I have done so far.
Run this command
pip install --user aws-sam-cli
Everything went fine.
Then I opened and edited the ~/.bash_profile. This is the content of the .bash_profile
export PATH=/Applications/MAMP/bin/php/php7.2.7/bin:$PATH
# Find your Python User Base path (where Python --user will install packages/scripts)
$ USER_BASE_PATH=$(python -m site --user-base)
# Update your preferred shell configuration
-- Standard bash --> ~/.bash_profile
-- ZSH --> ~/.zshrc
export PATH=$PATH:$USER_BASE_PATH/bin
Then I closed the terminal and run sam --version.
It is saying command not found. What is wrong with my installation?
The now recommended way to install SAM CLI is to use brew and honestly it's way better, saves you a lot of headaches, like the one you're facing now. See these instructions for details.

How do I run docker on AWS?

I have an aws code pipeline which currently successfully deploys code to my EC2 instances.
I have a Docker image that has the necessary setup to run my code, Dockerfile provided below. When I run docker run -t it just loads up an interactive shell on my docker but then hangs on any command (eg: ls)
Any advice?
FROM continuumio/anaconda2
RUN apt-get install git
ENV PYTHONPATH /app/phdcode/panaxeaA1
# setting up venv
RUN conda create --name panaxea -y
RUN /bin/bash -c "source activate panaxea"
# Installing necessary packages
RUN conda install -c guyer pysparse
RUN conda install -c conda-forge pympler
RUN pip install pysparse
RUN git clone https://github.com/usnistgov/fipy.git
RUN cd fipy && python setup.py install
RUN cd ~
WORKDIR /app
COPY . /app
RUN cd panaxeaA1/models/alpha04c/launchers
RUN echo "launching..."
CMD python launcher_260818_aws.py
docker run -t simply starts a docker container with a pseuodo-tty connection to the container's stdin. However, just running this command does not establish an interactive shell to the container. You will need this to be able to have run commands within your container.
You need to also append the -i command line flag along with the shell you wish to use. For example, docker run -it IMAGE_NAME bash will launch a container from the image you provide using bash as your interactive shell. You can then run Bash commands as you normally would.
If you are looking for a simple way to run containers on EC2 instances in AWS, I highly recommend AWS EC2 Container Service (ECS) as an option. It is a very simple service for running containers that abstracts and manages much of the server level work involved in running containers.

Missing AWS Dependency

i had a case where i need to configure an AWS structure similar to the architecture that is described in this article, is but this article is old, when i followed the steps i couldn't pass the step at which i run the script "vip_monitor.sh".
so be specific, at the step 5 by running the script i got the following error
Can't open /etc/profile.d/aws-apitools-common.sh
that shell script doesn't exist in the whole machine, how to solve this issue?
Thanks in advance
You will have to set api tools manually.
Ubuntu makes their own AMI's for Amazon, and they don't build the apitools into the images.
You can use official ubuntu documentation to fix these:
Install ec2 api tools
sudo apt-add-repository ppa:awstools-dev/awstools
sudo apt-get update
sudo apt-get install ec2-api-tools
actually i installed the ec2-api-tools J.Parashar instructed, and when i ran the script vip_monitor.sh it gave me the same error so i just took the missing script aws-apitools-common.sh file from an Amazon Linux instance and paste it at the path /etc/profile.d/ and then changed the mode to the script to executable chmod +x aws-apitools-common.sh and ran the script 'vip_monitor.sh'.
if you had the error :Unexpected operator run the script with bash ./vip_monitor.sh