The JupyterLab health check is intermittently failing for my AI Notebook. I noticed that Jupyter service still works even when the health check fails, since my queries still get executed successfully, and I'm able to open my notebook by clicking "Open Jupyterlab".
How can I debug why this health check is failing?
jupyterlab status failing
This is a new feature in Jupyter Notebooks, there is a now a health agent which runs inside Notebook and this specific check is to verify Jupyter Service status:
The check is:
if [ "$(systemctl show --property ActiveState jupyter | awk -F "=" '{print $2}')" == "active" ]; then echo 1; else echo -1; fi
You can use the gcloud notebooks instance get-health to get more details.
What is the DLVM version/image you are using and Machine type?
If you are using container based the check is:
docker exec -i payload-container ps aux | grep /opt/conda/bin/jupy
ter-lab | grep -v grep >/dev/null 2>&1; test $? -eq 0 && echo 1 || echo -1
And make sure: report-container-health and report-system-health are both True.
Related
I have created 4 instances in my AWS sagemaker NOTEBOOKS tab.
I want to create a life cycle configuration where the instance should stop every day at 9:00 PM.
I have seen some examples but it is with IDLE TIME but not with the specific time
#!/bin/bash
set -e
# PARAMETERS
IDLE_TIME=3600
echo "Fetching the autostop script"
wget -O autostop.py https://raw.githubusercontent.com/mariokostelac/sagemaker-setup/master/scripts/auto-stop-idle/autostop.py
echo "Starting the SageMaker autostop script in cron"
(crontab -l 2>/dev/null; echo "*/5 * * * * /bin/bash -c '/usr/bin/python3 $DIR/autostop.py --time ${IDLE_TIME} | tee -a /home/ec2-user/SageMaker/auto-stop-idle.log'") | crontab -
echo "Changing cloudwatch configuration"
curl https://raw.githubusercontent.com/mariokostelac/sagemaker-setup/master/scripts/publish-logs-to-cloudwatch/on-start.sh | sudo bash -s auto-stop-idle /home/ec2-user/SageMaker/auto-stop-idle.log
Can anyone help me out on this one?
Change the crontab syntax to 0 21 * * * shutdown.py
Then create a shutdown.py which is reduced version of the autostop.py and contains mainly:
...
print('Closing notebook')
client = boto3.client('sagemaker')
client.stop_notebook_instance(NotebookInstanceName=get_notebook_name())
BTW: triggering shutdown now directly from the crontab command didn't work for me, therefore calling the SageMaker API instead.
Most of the AWS CLI calls are asynchronous.
Therefore, after you call them, you have no idea if end product was successful or not.
Is there a simple solution for, checking if environment was created successfully, as an example, other than creating timed polling verification calls, etc etc.
Sorry I did not mention previously, but I am specifically looking for solutions from Powershell
You can check the Exit Status for cli command.
What is an exit code in bash shell?
Every Linux or Unix command executed by the shell script or user has
an exit status. Exit status is an integer number. 0 exit status means
the command was successful without any errors
code snippet:
aws cli command
if [ $? -ne 0 ]
then
echo "Error"
exit 1;
else
echo "Passed"
fi
another method is to wait for a response from the command :
while :
do
sleep 10
echo "Waiting for elasticsearch domain endpoint..."
local ELASTICSEARCH_ENDPOINT=$(aws es describe-elasticsearch-domain --domain-name ${ES_DOMAIN_NAME} --region ${AWS_REGION} --output text --query 'DomainStatus.Endpoints.vpc')
if [ ${ELASTICSEARCH_ENDPOINT} != "null" ]
then
echo "Elasticsearch endpoint: ${ELASTICSEARCH_ENDPOINT}"
break
fi
done
I'm trying to install a Monitoring Agent on my f1-micro instance with Debian 9 and running dockerised application. I'm following the https://cloud.google.com/monitoring/agent/install-agent#linux-install tutorial. When I execute sudo bash install-monitoring-agent.sh I get a message Unidentifiable or unsupported platform..
Am I doing anything wrong?
Inside the bash script you will see that he looking on the os-release on [/etc/os-release] see below:
============================================================================
# Fallback for systems lacking /etc/os-release.
if [[ -f /etc/debian_version ]]; then
echo 'Installing agent for Debian.'
install_for_debian
elif [[ -f /etc/redhat-release ]]; then
echo 'Installing agent for Red Hat.'
install_for_redhat
else
echo >&2 'Unidentifiable or unsupported platform.'
echo >&2 "See ${MONITORING_AGENT_SUPPORTED_URL} for a list of supported platforms."
exit 1
Check the [/etc/os-release].
Normally Debian its supported, and I did install {f1-micro} machine with Debian and it worked perfectly, the output should be like below:
~$ sudo cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
Startup scripts are not behaving the way that I expected them to.
I have a .sh file in a storage bucket and have included a startup-script-url meta tag with the value gs://bucket-name/start-script.sh
[[0;32m OK [0m] Started Google Compute Engine Accounts Daemon.
Starting Google Compute Engine Startup Scripts...
[[0;32m OK [0m] Started Google Compute Engine Shutdown Scripts.
[[0;32m OK [0m] Started Google Compute Engine Startup Scripts.
[[0;32m OK [0m] Started Docker Application Container Engine.
[[0;32m OK [0m] Started Wait for Google Container Registry (GCR) to be accessible.
[[0;32m OK [0m] Reached target Google Container Registry (GCR) is Online.
[[0;32m OK [0m] Started Containers on GCE Setup.
[ 8.001248] konlet-startup[1018]: 2018/03/08 20:23:56 Starting Konlet container startup agent
The below script is not executed as expected. I tried using the startup-script metadata tag and using something simple like echo "hello" but that's not working either. I have full Cloud API access scopes enabled.
If I copy the contents of the below file and paste it into the ssh terminal it works perfectly.
Could really use some help =D
start-script.sh
#! /bin/bash
image_name=gcr.io/some-image:version-2
docker_images=$(docker inspect --type=image $image_name)
array_count=${#docker_images[0]}
# Check if docker image is available
while ((array_count = 2));
do
docker_images=$(docker inspect --type=image ${image_name})
array_count=${#docker_images[0]}
if (($array_count > 2)); then
break
fi
done
################################
#
# Docker image now injected
# by google compute engine
#
################################
echo "docker image ${image_name} available"
container_ids=($(docker container ls | grep ${image_name} | awk '{ print $1}'))
for (( i=0; i < ${#container_ids[#]}; i++ ));
do
# run command for each container
container_id=${container_ids[i]}
echo "running commands for container: ${container_ids[i]}"
# start cloud sql proxy
docker container run -d -p $HOST_PORT:$APPLICATION_PORT \
${container_ids[i]} \
./cloud_sql_proxy \
-instances=$PG_INSTANCE_CONNECTION_NAME=tcp:$PG_PORT \
-credential_file=$PG_CREDENTIAL_FILE_LOCATION
# HTTP - Start unicorn webserver
docker exec -d \
${container_ids[i]} \
bundle exec unicorn -c config/unicorn.rb
done
Okay... after some scenario testing... found out that startup scripts do not run if you use the "Deploy a container image to this VM instance" option. Hope this saves you from tearing your hair out haha.
startup-script always run when use the "Deploy a container image to this VM instance" option.
You can use cmd sudo journalctl -u google-startup-scripts.service -f to check script run result.
You can use cmd sudo google_metadata_script_runner -d --script-type startup to debug script.
2021.11.09 Updated: sudo google_metadata_script_runner startup.
doc: https://cloud.google.com/compute/docs/instances/startup-scripts
Startup scripts for Container-Optimised OS must be configured differently. Use the user-data metadata tag, and pass it a cloud-config configuration. The example from the docs is below.
#cloud-config
bootcmd:
- fsck.ext4 -tvy /dev/DEVICE_ID
- mkdir -p /mnt/disks/MNT_DIR
- mount -t ext4 -O ... /dev/DEVICE_ID /mnt/disks/MNT_DIR
I had a similar issue, after I've removed execution rights for files from /tmp. Take into account that startup scripts are copied to /tmp/ and afterwards ran from there.
I have a bash script. I would like to run it continuously on google cloud server. I connected to my VM via SSH in browser but after I've closed my browser, script was stopped.
I tried to use Cloud Shell but if I restart my laptop, script launches from start. It doesn't work continuously!
Is it possible to launch my script in google cloud, shut down laptop and be sure what my script works?
The solution: GNU screen. This awesome little tool let's you run a process after you've ssh'ed into your remote server, and then detach from it - leaving it running like it would run in the foreground (not stopped in the background).
So after we've ssh'ed into our GCE VM, we will need to:
1. install GNU screen:
apt-get update
apt-get upgrade
apt-get install screen
type "screen". this will open up a new screen - kind of similar in look & feel to what "clear" would result in.
run the process (e.g.: ./init-dev.sh to fire up a ChicagoBoss erlang server)
type: Ctrl + A, and then Ctrl + D. This will detach your screen session but leave your processes running!
feel free to close the SSH terminal. whenever you feel like it, ssh back into your GCE VM, and type screen -r to resume your previously detached session.
to kill all detached screens, run:
screen -ls | grep pts | cut -d. -f1 | awk '{print $1}' | xargs kill
You have the following options:
1. Task schedules - which involves cron jobs. Check this sample. Via this answer;
2. Using startup scripts.
I performed the following test and it worked for me:
I created an instance in GCE, SSH-d into it and created the following script, myscript.bash:
#!/bin/bash
sleep 15s
echo Hello World > result.txt
and then, ran
$ bash myscript.bash
and immediately closed the browser window holding the SSH session.
I then waited for at least 15 seconds, re-engaged in an SSH connection with the VM in question and ran $ ls and voila:
myscript.bash result.txt
So the script ran even after closing the browser holding the SSH session.
Still, technically, I believe your solution lies with 1. or 2.
You can use
nohup yourscript.sh > output_log_file.log
I faced similar issue. I logged into Virtual Machine through google cloud command on my local machine, tried to exit by closing the terminal, It halted the script running in the instance.
Use command exit to log out of cloud consoles in local machine putty console (twice).
Make sure you have not enabled "PREEMPT INSTANCE" while creating a VM instance.
It will force to close the instance within 24 hours to reduce the costing by a huge difference.
I have a NodeJS project and I solved with pm2