Programmatically enable installed extensions in Vertex AI Managed Notebook instance - google-cloud-platform

I am working in JupyterLab within a Managed Notebook instance, accessed through the Vertex AI workbench, as part of a Google Cloud Project. When the instance is created, there are a number of JupyterLab extensions that are installed by default. In the web GUI, one can click the puzzle piece icon and enable/disable all extensions with a single button click. I currently run a post-startup bash script to manage environments and module installations, and I would like to add to this script whatever commands would turn on the existing extensions. My understanding is that I can do this with
# Status of extensions
jupyter labextension list
# Enable/disable some extension
jupyter labextension enable extensionIdentifierHere
However, when I test the enable/disable command in an instance Terminal window, I receive, for example
[Errno 13] Permission denied: '/opt/conda/etc/jupyter/labconfig/page_config.json'
If I try to run this with sudo, I am asked for a password, but have no idea what that would be, given that I just built the environment and didn't set any password.
Any insights on how to set this up, what the command(s) may be, or how else to approach this, would be appreciated.
Potentially relevant:
Not able to install Jupyterlab extensions on GCP AI Platform Notebooks
Unable to sudo to Deep Learning Image
https://jupyterlab.readthedocs.io/en/stable/user/extensions.html#enabling-and-disabling-extensions
Edit 1:
Adding more detail in response to answers and comments (#gogasca, #kiranmathew). My goal is to use ipyleaft-based mapping, through the geemap and earthengine-api python modules, within the notebook. If I create a Managed Notebook instance (service account, Networks shared with me, Enable terminal, all other defaults), launch JupyterLab, open the Terminal from the Launcher, and then run a bash script that creates a venv virtual environment, exposes a custom kernel, and performs the installations, I can use geemap and ipywidgets to visualize and modify (e.g., widget sliders that change map properties) Google Earth Engine assets in a Notebook. If I try to replicate this using a Docker image, it seems to break the connection with ipyleaflet, such that when I start the instance and use a Notebook, I have access to the modules (they can be imported) but can't use ipyleaflet to do the visualization. I thought the issue was that I was not properly enabling the extensions, per the "Error displaying widget: model not found" error, addressed in this, this, this, this, etc. -- hence the title of my post. I tried using and modifying #TylerErickson 's Dockerfile that modifies a Google deep learning container and should handle all of this (here), but both the original and modifications break the ipyleaflet connection when booting the Managed Notebook instance from the Docker image.

Google Managed Notebooks do not support third-party JL extensions. Most of these extensions require a rebuild of the JupyterLab static assets bundle. This requires root access which our Managed Notebooks do not support.
Untangling this limitation would require a significant change to the permission and security model that Managed Notebooks provides. It would also have implications for the supportability of the product itself since a user could effectively break their Managed Notebook by installing something rogue.
I would suggest to use User Managed Notebooks.

Related

Clone a google cloud VM

I have google cloud VM with Ubuntu installed along with various services and libraries. I need to make a similar bootable VM with the same OS and all the data, libraries etc in the already configured VM. How do I clone the VM with these requirements?
I tried to create an image from the already existing VM and could not SSH into it.
So I retraced my installations step by step trying to figure out which step is breaking the image.
I created an Ubuntu(18.04) VM and used that to create an image. The instance I created using the image did allow me to SSH into.
Next installed Ubuntu desktop and xorg server and created an image after that. Using that image, I created a new VM and tried to SSH into it.
But unfortunately, the SSH connection could not be established. So I think it is these installations that are causing the error if it is not some sort of system error.
Below are the exact commands I ran to install these after creating an Ubuntu(18.04) VM:
sudo passwd username
sudo su -
passwd
apt update && apt upgrade -y
adduser username root
adduser username admin
adduser username sudo
apt-get install ubuntu-desktop -y
apt-get install xserver-xorg-video-dummy
nano /etc/X11/xorg.conf
and pasted the following into the .conf file
Section "Device"
Identifier "Configured Video Device"
Driver "dummy"
EndSection
Section "Monitor"
Identifier "Configured Monitor"
HorizSync 31.5-48.5
VertRefresh 50-70
EndSection
Section "Screen"
Identifier "Default Screen"
Monitor "Configured Monitor"
Device "Configured Video Device"
DefaultDepth 24
SubSection "Display"
Depth 24
Modes "1600x900"
EndSubSection
EndSection
After this state, I created the image using which I could not instantiate a VM that I could SSH into.
Since you have your VM ready and running; backup your image as per this GCP document. Follow the guidelines before you begin the process which were mentioned in the document like updating Google cloud CLI setting default region and zone and for general image guidelines.
Few networking features may require guest operating system mode. You can also check how to export a custom image to cloud storage.
You can also consider the Snapshot Approach.
Follow this process in order to create the image exactly as the one you have already set up and you know is working correctly. As you may already know, this is a custom image so they are available only to your Cloud project. You can create a custom image from boot disks and other images if you would like also. Then, use the custom image to create an instance.
I will also suggest you to give a look at this document which would give you a deeper knowledge on the task.
Regards,
Just spin up a new container from a disk snapshot, if you need an exact copy. And if you cannot SSH, you may either not have a SSH public key provisioned, no external IP assigned, or :22 closed.
gcloud ssh always works. One can as well provision project-wide SSH keys, which all VM in the project will inherit then. The documentation below: About VM metadata explains this all in detail.
My personal favorite are rather startup scripts, which describe the configuration, instead of copying it.
And it's not so difficult to get started with these: cat ~/.bash_history > rocky8_startup.sh. In a software-defined data-center, it might make sense to use software-defined configurations (one simply cannot alternate the installation per VM instance, when starting with a disk snapshot).
xserver-xorg-video-dummy is questionable, because one can enable display device -but unless recording the screen, this driver might still suffice; eg. for VNC sessions.

Can you load standard zeppelin interpreter settings from S3?

Our company is building up a suite of common internal Spark functions and jobs, and I'd like to make sure that our data scientists have access to all of these when they prototype in Zeppelin.
Ideally, I'd like a way for them to start up a Zeppelin notebook on AWS EMR, and have the dependency jar we build automatically loaded onto it without them having to manually type in the maven information manually every time (private repo location/credentials, package info, etc).
Right now we have the dependency jar loaded on S3, and with some work we could get a private maven repository to host it on.
I see that ZEPPELIN_INTERPRETER_DIR saves off interpreter settings, but I don't think it can load from a common default location (like S3, or something)
Is there a way to tell Zeppelin on an EMR cluster to load it's interpreter settings from a common location? I can't be the first person to want this.
Other thoughts I've had but have not tried yet:
Have a script that uses aws cmd line options to start a EMR cluster with all the necessary settings pre-made for you. (Could also upload the .jar dependency if we can't get maven to work)
Use a infrastructure-as-code framework to start up the clusters with the required settings.
I don't believe it's possible to tell EMR to load settings from a common location. The first thought you included is the way to go imo - you would aws emr create ... and that create would include a shell script step to replace /etc/zeppelin/conf.dist/interpreter.json by downloading the interpreter.json of interest from S3, and then hard restart zeppelin (sudo stop zeppelin; sudo start zeppelin).

Is it possible to SSH in AWS instances using any IDEs such PYCHARM?

I am stuck in a technical issue on a project and I think you the forum could help me out.
I have an EC2 Instance Type:p2.xlarge running on AWS, I cloned a repository in this instance which requires pytorch and cuda dependencies(this point has been taken care of).
Now, The issue is that I wanna work & run this code-base(which is is AWS instance now) somehow in my local pyCHARM IDE. In short, I didn't have proper resources on my laptop to run the repository, so I have to run in an AWS instance but for debugging purposes the local IDE would be a great option.
Is it possible to do that?. In other words, we can do SSH into AWS instance and run code, but all will be done through command line, if we could SSH through PYCHARM and can see the code in AWS here in local machine within PYCHARM and change, debug or run it as it was local but actually it gets executed in the instance.
Please suggest a solution to it.
Thanks in advance.
EDIT-1:
After following, #Cromulent suggestion, I have arrived here
Setting the remote:
Upload happening within the local & remote repo.
I still didn't understand the requirement of syncing the local and remote folders, when I only want to open the remote folder in my PYCHARM IDE and work on it.
I think after this setup, I have to change the code in local copy and the PYCHARM will sync the code in remote copy. How will I be running(using resources-GPUs of the remote Instance, not my local machine.) the remote code in PYCHARM in this scenario, I am just syncing it, for running again I have to ssh through command line and run the script(This does not serve the purpose)?
EDIT-2:
After #Cromulent suggestions.
Actually, it did work, but still, I am not able to run the remote code locally.
I am getting the below error while running any remote script. If I run the same script using ssh in the terminal, the scripts run normally. I tried to fix the problem using this post on StackOverflow, but it didn't work too.
ssh://ubuntu#ec2-52-41-247-169.us-west-2.compute.amazonaws.com:22/home/ubuntu/anaconda3/bin/python -u <08ad9807-3477-4916-96ce-ba6155e3ff4c>/home/ubuntu/InsightProject/scripts/download_flownet2.py
/home/ubuntu/anaconda3/bin/python: can't open file '<08ad9807-3477-4916-96ce-ba6155e3ff4c>/home/ubuntu/InsightProject/scripts/download_flownet2.py': [Errno 2] No such file or directory
The below is the screenshot for the above problem:
PyCharm Professional supports remote Python interpreters (either the globally installed Python interpreter or a virtualenv). It works by creating an SSH connection to the server and then running the code on the remote host. The results are then displayed locally in PyCharm Professional. You can also do remote debugging as well.
You MUST be using the professional version of PyCharm though. The free community version does not support this feature.
You can find the documentation here:
https://www.jetbrains.com/help/pycharm/configuring-remote-interpreters-via-ssh.html
One more solution is to deploy a Jupyter Notebook on your remote server. Then you will be able to use it from PyCharm Professional Edition.
Don't forget to make rules for the jupyter ports (e.g. allow all 8888) in your AWS console and in your instance.
To configure a remote interpreter for your notebook do this (source):
Open the Jupyter Notebook page of the Settings/Preferences dialog.
On this page, select or clear the Markdown cells rendering enabled option, and specify the username and password. Note that for the
single-user notebooks these fields are optional - leave them blank.
Fill in the username (for JupyterHub) and password.
Click the link Configure remote interpreter. You'll find yourself at the Project Interpreter page.
Configure the remote interpreter, as described in the section Configuring Python Interpreter.
You will want to configure a remote interpreter.
I tried the above approach but it didn't work for me. I have edited my post so that I can get additional input from the community, but I didn't any after the first answer was posted.
My friend actually figured out a secondary way to fix the issue. He actually uses "NOMACHINE" on the local machine and open connection to the remote desktop. Then you can directly install PYCHARM in the remote machine and work in there. I hope this will help others.
The solution is in his blog post. (Thanks to Shaobo Guan)
Another solution would be to use VNC instead of NoMachine

Programming the Pepper robot without "Choreography" software?

Usually the developer can use Softbanks own software Choreography to give programs to Pepper robot.
Isn't there a way to setup a different development environment? e.g. Access via SSH and creating Python scripts with a simple text editor and starting the script manually? It means writing and starting Python scripts for Pepper without using Choreography.
You can also use qibuild (pip install qibuild) : https://github.com/aldebaran/qibuild
It contains a qipkg command, just run
qipkg deploy-package path/to/your/file.pml --url USER#IP:/home/nao
A pml file is a project, it is created by Choregraph, or you can use this tool :
https://github.com/pepperhacking/robot-jumpstarter
in order to get a sample app.
Of course, using Choregraphe is not an obligation, you can use the different SDKs directly.
You can for instance create a python script on your computer, copy it on the robot
scp path/to/script/myscript.py nao#robotIp
And then ssh onto the robot and launch the script
ssh nao#robotIp
python myscript.py
You can also ssh onto the robot, create a script (using nano for instance) and launch it from there.
I've been using Pycharm Pro for 6 months and I am happy with it. You get automatic deployment and remote debugging. The most basic setup must still be done with Choregraphe, but it takes less than one minut.

Can I run gcloud components update?

Will updating gcloud components from within my Google Cloud Shell instance persist?
Will updating anything, like Go or NPM, that is pre-installed with Google Cloud Shell persist?
Yes, depending upon where you install those tools.
When you init a new cloud shell, you get a disk for yourself, and the system image is constructed using a template. So any changes that you do to your disk will persist, while anything you do to core image, will not.
All the pre-installed tools are part of the system image that is updated for all the users and is maintained by GCP team. If you are updating or switching versions there, they will not persist.
But if you want to install custom tools, or switch to a specific version, you can install those tools at your $HOME. All those tools will be installed in your disk and hence will persist across termination/relaunches.