I have been using both Google Colab and GCP VM instances for training some deep learning models.
With Google Colab, I haven't had any issues so far with installing a wide variety of deep learning packages that I explore as part of my work. With GCP VM instances though (even using the pre-configured Deep Learning VMs offered by Google), I frequently run into issues.
I was wondering if there is a way to export the Google Colab VM as an image to Google Cloud Storage, and then launch an instance in GCP using this image.
I tried searching online for this, but I couldn't find anything.
Is this do-able? Or, are there any other alternatives?
Thanks!
Related
I'm trying to save budget on jupyter notebooks on Google Cloud but couldn't find a way to run Vertex AI Workbench (Notebooks) on spot machines.
What are my alternatives?
The short answer is no; the better answer is: you have an alternative.
Vertex AI Workbench is indeed a managed service with Compute Engine VM as the underlying infrastructure. However it doesn't support Spot/Preemptible instances.
Instead you can quickly install a deep/machine learning image on a VM using a Google's images. See this detailed tutorial.
Deep Learning VMs don't support launching from the GCP Console and more features like co-coding. But it does support Spot/Preemptible instances and doesn't introduce a management fee. So you get lesser experience but also pay less.
I am trying to build an app where the user is able to upload a file to cloud storage. This would then trigger a model training process (and predicting later on). Initially I though I could do this with cloud functions/pubsub and cloudml, but it seems that cloud functions are not able to trigger gsutil commands which is needed for cloudml.
Is my only option to enable cloud-composer and attach GPUs to a kubernetes node and create a cloud function that triggers a dag to boot up a pod on the node with GPUs and mounting the bucket with the data? Seems a bit excessive but I can't think of another way currently.
You're correct. As for now, there's no possibility to execute gsutil command from a Google Cloud Function:
Cloud Functions can be written in Node.js, Python, Go, and Java, and are executed in language-specific runtimes.
I really like your second approach with triggering the DAG.
Another idea that comes to my mind is to interact with GCP Virtual Machines within Cloud Composer through the Python operator by using the Compute Engine Pyhton API. You can find more information in automating infrastructure and taking a deep technical dive into the core features of Cloud Composer here.
Another solution that you can think of is Kubeflow, which aims to make running ML workloads on Kubernetes. Kubeflow adds some resources to your cluster to assist with a variety of tasks, including training and serving models and running Jupyter Notebooks. Please, have a look on Codelabs tutorial.
I hope you find the above pieces of information useful.
Is there anyway I convert/add Jupyter Lab to an existing VM. This VM was created under Compute -> Compute Engine -> VM Instances. When I go to AI Platforms -> Notebooks, I do not see this instance, so I'm assuming it is not setup to use Jupyter Lab. However, the settings on this VM should be more than sufficient to run Jupyter Lab, so I was hoping to add this functionality. Thank you.
The purpose of AI Platform Notebooks is to be a managed service for Jupyter Notebook. Please have a look at the documentation AI Platform Notebooks:
Managed JupyterLab notebook instances
AI Platform Notebooks is a managed service that offers an integrated
and secure JupyterLab environment for data scientists and machine
learning developers to experiment, develop, and deploy models into
production. Users can create instances running JupyterLab that come
pre-installed with the latest data science and machine learning
frameworks in a single click.
Also, if you check managed VM you'll found the description of the image:
Google, Deep Learning Image: Container Base, m50, A Debian based image with Docker and NVIDIA-Docker support for custom containers with Deep Learning Image integration.
and this image is different from the image you have on your VM.
As result, unfortunately, you're not able to use your existing VM for such purposes.
You can try to file a feature request at Google Issue Tracker if you really want to be able to do it.
I have searched for an answer to this question and this question is duplicate but I need clarification as I looked at two different places and answers are a bit opposite.
The following Stack Overflow answer mentions that Google Cloud AI Platform Notebooks is an upgraded version of Google Cloud Datalab. On the following Quora page, one of the architects mentions that Cloud Datalab is built on top of Jypyter Notebook.
Cloud Datalab is adding a new network of its own. AI Notebooks remains within an existing network. With the current setup of my environment, I do not want to add overhead of maintaining extra network and security to watch over, and so AI Notebooks is the immediate solution. But I would also want to understand the benefits that Cloud Datalab provides.
Between AI Notebook and Cloud Datalab, which should be used and in which
scenario?
Is Cloud Datalab also providing pre-installed packages of Python,
Tensorflow or R environment like AI Notebooks?
Between AI Notebook and Cloud Datalab, which should be used and in
which scenario?
You should use AI notebooks on new projects in any case since Cloud Datalab would be deprecated sooner than later.
Is Cloud Datalab also providing pre-installed packages of Python,
Tensorflow or R environment like AI Notebooks?
Yes it does.
Summary of the differences between the two products.
DataLab
Custom UI that is not compatible with latest JupyterLab extensions.
Using old PyDatalab SDK since when DataLab was released there were no official SDK available for many of GCP services.
No major changes on RoadMap.
Requires SSH with port mapping to use
Notebooks:
Using JupyterLab UI.
Using official SDKs (like BigQuery Python SDK), therefore better integration.
Since UI (JupyterLab) is community driven releasing new changes rapidly.
Access to UI is simple, no SSH, no CLI usage is required.
Notebooks API
Terraform support
Client libraries (Python, Java, NodeJS) to manage Notebooks
I'm looking into the best way to set up an end-to-end machine learning pipeline, and evaluating the data exploration component options.
I'm trying to figure out the difference between google cloud datalab, and google cloud ai platform notebooks. They both seem to offer similar functionality, so not sure why they both exist, or whether one is a new iteration of the other.
If they are different, what is the benefit of one over the other?
Google Cloud AI Platform Notebooks is effectively the upgraded version of Google Cloud Datalab and gives you benefits like being able to use the notebook directly in your browser without having to setup an ssh tunnel first (which datalab forces you to do).
If you're creating a new notebook, you 100% want to go with AI Platform Notebooks
(Also, Datalab is now deprecated)
Quoting the End-to-End Machine Learning with TensorFlow on GCP course on Coursera (link) -
AI Platform Notebooks is the next generation of hosted notebook on GCP, and has replaced Cloud Datalab.