Sagemaker lifecycle config: could not find conda environment conda_python3 - amazon-web-services

the below script should run a notebook called prepTimePreProcessing whenever a AWS notebook instance starts runing.
however I am getting "could not find conda environment conda_python3" error from the lifecycle config file.
set -e
ENVIRONMENT=python3
NOTEBOOK_FILE="/home/ec2-user/SageMaker/prepTimePreProcessing.ipynb"
echo "Activating conda env"
source /home/ec2-user/anaconda3/bin/activate "$ENVIRONMENT"
echo "Starting notebook"
nohup jupyter nbconvert --to notebook --inplace --ExecutePreprocessor.timeout=600 --ExecutePreprocessor.kernel_name=python3 --execute "$NOTEBOOK_FILE" &
Any help whould be appreciated.

Assuming no environment problems, if you open a terminal in the instance in use and run:
conda env list
the result should also contain this line:
python3 /home/ec2-user/anaconda3/envs/python3
After that, you can create a .sh script inside /home/ec2-user/SageMaker containing all the code to run. This way it also becomes versionable by being a persisted file in the instance space and not inside an external configuration.
The on-start.sh/on-create.sh (from this point I will simply call it script.sh) file becomes trivially:
# PARAMETERS
ENVIRONMENT=python3
# conda env
source /home/ec2-user/anaconda3/bin/activate "$ENVIRONMENT";
echo "'$ENVIRONMENT' env activated"
In the lifecycle config, on the other hand, just write a few lines to invoke the previously created script.sh:
#!/bin/bash
set -e
SETUP_FILE=/home/ec2-user/SageMaker/script.sh
echo "Run setup script"
sh "$SETUP_FILE"
echo "Setup completed!"
Extra
If you want to add a safety check so that the .sh file is read correctly regardless of line breaks, I would also add a conversion:
#!/bin/bash
set -e
SETUP_FILE=/home/ec2-user/SageMaker/script.sh
# convert script to unix format
echo "Converting setup script into unix format"
sudo yum -y install dos2unix > /dev/null 2>&1
dos2unix "$SETUP_FILE" > /dev/null 2>&1
echo "Run setup script"
sh "$SETUP_FILE"
echo "Setup completed!"

Related

"Sagemaker Notebook with Interactive Session -- Install packages

We have followed this doc to spin up notebook running with interactive sessions. We want to add a few python packages to the environment to assist with development (i.e. pyright). I have added the pip install at the bottom, stopped the instance, restart instance, run "import pyright", but I get "ModuleNotFoundError: No module named 'pyright'"
#!/bin/bash
set -ex
sudo -u ec2-user -i <<'EOF'
ANACONDA_DIR=/home/ec2-user/anaconda3
# Create and Activate Conda Env
echo "Creating glue_pyspark conda enviornment"
conda create --name glue_pyspark python=3.7 ipykernel jupyter nb_conda -y
echo "Activating glue_pyspark"
source activate glue_pyspark
# Install Glue Sessions to Env
echo "Installing AWS Glue Sessions with pip"
pip install aws-glue-sessions
# Clone glue_pyspark to glue_scala. This is required because I had to match kernel naming conventions to their environments and couldn't have two kernels in one conda env.
echo "Cloning glue_pyspark to glue_scala"
conda create --name glue_scala --clone glue_pyspark
# Remove python3 kernel from glue_pyspark
rm -r ${ANACONDA_DIR}/envs/glue_pyspark/share/jupyter/kernels/python3
rm -r ${ANACONDA_DIR}/envs/glue_scala/share/jupyter/kernels/python3
# Copy kernels to Jupyter kernel env (Discoverable by conda_nb_kernel)
echo "Copying Glue PySpark Kernel"
cp -r ${ANACONDA_DIR}/envs/glue_pyspark/lib/python3.7/site-packages/aws_glue_interactive_sessions_kernel/glue_pyspark/ ${ANACONDA_DIR}/envs/glue_pyspark/share/jupyter/kernels/glue_pyspark/
echo "Copying Glue Spark Kernel"
mkdir ${ANACONDA_DIR}/envs/glue_scala/share/jupyter/kernels
cp -r ${ANACONDA_DIR}/envs/glue_scala/lib/python3.7/site-packages/aws_glue_interactive_sessions_kernel/glue_spark/ ${ANACONDA_DIR}/envs/glue_scala/share/jupyter/kernels/glue_spark/
echo "Changing Jupyter kernel manager from EnvironmentKernelSpecManager to CondaKernelSpecManager"
JUPYTER_CONFIG=/home/ec2-user/.jupyter/jupyter_notebook_config.py
sed -i '/EnvironmentKernelSpecManager/ s/^/#/' ${JUPYTER_CONFIG}
echo "c.CondaKernelSpecManager.name_format='conda_{environment}'" >> ${JUPYTER_CONFIG}
echo "c.CondaKernelSpecManager.env_filter='anaconda3$|JupyterSystemEnv$|/R$'" >> ${JUPYTER_CONFIG}
# Install python modules to env
pip install "pyright"
EOF
systemctl restart jupyter-server
Am I missing something in the script? I assumed just "pip install "pyright"" would've worked.
Update:
I have included the following under the pip install aws-glue-sessions:
pip install "pyright"
and
pip install pyright
When I check the CloudWatch logs, I see that the package is being downloaded... I would assume it means it's installed.
[1]: https://i.stack.imgur.com/JeKce.png

ENTRYPOINT just refuses to exec or even shell run

This is my 3rd day of tear-your-hair-out since the weekend and I just cannot get ENTRYPOINT to work via gitlab runner 13.3.1, this for something that previously worked with a simple ENTRYPOINT ["/bin/bash"] but that was using local docker desktop and using docker run followed by docker exec commands which worked like a synch. Essentially, at the end of it all I previously got a WAR file built.
Currently I build my container in gitlab runner 13.3.1 and push to s3 bucket and then use the IMAGE:localhost:500/my-recently-builtcontainer and then try and do whatever it is I want with the container but I cannot even get ENTRYPOINT to work, in it's exec form or in shell form - atleast in the shell form I get to see something. In the exec form it just gave "OCI runtime create failed" opaque errors so I shifted to the shell form just to see where I could get to.
I keep getting
sh: 1: sh: echo HOME=/home/nonroot-user params=#$ pwd=/ whoami=nonroot-user script=sh ENTRYPOINT reached which_sh=/bin/sh which_bash=/bin/bash PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin; ls -alrth /bin/bash; ls -alrth /bin/sh; /usr/local/bin/entrypoint.sh ;: not found
In my Dockerfile I distinctly have
COPY entrypoint.sh /usr/local/bin/entrypoint.sh
RUN bash -c "ls -larth /usr/local/bin/entrypoint.sh"
ENTRYPOINT "echo HOME=${HOME} params=#$ pwd=`pwd` whoami=`whoami` script=${0} ENTRYPOINT reached which_sh=`which sh` which_bash=`which bash` PATH=${PATH}; ls -alrth `which bash`; ls -alrth `which sh`; /usr/local/bin/lse-entrypoint.sh ;"
The output after I build the container in gitlab is - and I made sure anyone has rights to see this file and use it - just so that I can proceed with my work
-rwxrwxrwx 1 root root 512 Apr 11 17:40 /usr/local/bin/entrypoint.sh
So, I know it is there and all the chmod flags indicate anybody can look at it - so I am so perplexed why it is saying NOT FOUND
/usr/local/bin/entrypoint.sh ;: not found
entrypoint.sh is ...
#!/bin/sh
export PATH=$PATH:/usr/local/bin/
clear
echo Script is $0
echo numOfArgs is $#
echo paramtrsPassd is $#
echo whoami is `whoami`
bash --version
echo "About to exec ....."
exec "$#"
It does not even reach inside this entrypoint.sh file.

How to pass a variable to a script in user_data

I am trying to run a bash script file in user_data that prompts the user for a domain. Here is the domain part of the commands that are of .sh file itself.
DOMAIN=$1
if [ -z $1 ]
then
echo ""
printf "Enter the domain you want to host BookStack and press [ENTER]\nExamples: my-site.com or docs.my-site.com\n"
read DOMAIN
fi
I would like to pass my EIP, aws_eip.one.public_ip as an input to the script.
Here is the actual commands that are run in the user_data section.
#!/bin/bash
sudo apt install wget
# Ensure you have read the above information about what this script does before executing these commands.
sudo apt install -y wget
# Download the script
wget https://raw.githubusercontent.com/BookStackApp/devops/main/scripts/installation-ubuntu-18.04.sh
# Make it executable
chmod a+x installation-ubuntu-18.04.sh
# Run the script with admin permissions
sudo ./installation-ubuntu-18.04.sh $ (this is where I would like to pass my eip variable)
Appreciate the help!
Get the IP from the ec2 metadata in your user data:
curl http://169.254.169.254/latest/meta-data/public-ipv4

How to run AWS SageMaker lifecycle config scripts as a background job

I am trying to customize Amazon SageMaker Notebook Instances using Lifecycle Configurations because I need to install additional pip packages. What it means is I have to create a on-start.sh and on-create.sh script within a lifecycle configuration. You can see a sample here.
Now, I have many packages and the installation time might go over 5 minutes, causing a potential timeout. It is suggested to use nohup to run the script as a background job in that case.
But how do I run this with a nohup since I do not have a terminal in this case [see above screenshot]? Is there a way to run the script as a background job from within the script? Anything else I am missing? Please suggest
I have done this before, install many libraries for around 15 minutes. I wrapped the script I actually want to run in a create.sh and run that create.sh using nohup. Now the logs of these you can view on cloudwatch and also sagemaker start wont time out with a plus that you will have nohup.out file where you executed the nohup.
Below I wrapped script in https://github.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples/tree/master/scripts/export-to-pdf-enable into create.sh
#!/bin/bash
set -e
cat <<'EOF'>create.sh
#!/bin/bash
sudo -u ec2-user -i <<'EOF'
set -e
# OVERVIEW
# This script enables Jupyter to export a notebook directly to PDF.
# nbconvert depends on XeLaTeX and several LaTeX packages that are non-trivial to
# install because `tlmgr` is not included with the texlive packages provided by yum.
# REQUIREMENTS
# Internet access is required in on-create.sh in order to fetch the latex libraries from the ctan mirror.
sudo yum install -y texlive*
unset SUDO_UID
ln -s /home/ec2-user/SageMaker/.texmf /home/ec2-user/texmf
EOF
echo 'EOF' >> create.sh
nohup bash create.sh &

Manually Script is working but at the time of Automation its not working.-AWS

I am writing a script, and when I execute this script inside of my RHEL EC2 instance manually, it works as expected.
However, when i am trying to automate using a CloudFormation template (that means putting in a s3 bucket and downloading in user-data from there) it is not running.
I have the following commands in my .bash_profile
sudo sed -e '11 a export ORACLE_HOME=/usr/lib/oracle/12.1/client64' -i /home/ec2-user/.bash_profile
sudo sed -e '12 a export LD_LIBRARY_PATH=$ORACLE_HOME/lib' -i /home/ec2-user/.bash_profile
sudo sed -e '13 a export PATH=$ORACLE_HOME/bin:$PATH' -i /home/ec2-user/.bash_profile
by elevating the permission of script, it worked.