What is difference between AI Notebook and Cloud Datalab in GCP? - google-cloud-platform

I have searched for an answer to this question and this question is duplicate but I need clarification as I looked at two different places and answers are a bit opposite.
The following Stack Overflow answer mentions that Google Cloud AI Platform Notebooks is an upgraded version of Google Cloud Datalab. On the following Quora page, one of the architects mentions that Cloud Datalab is built on top of Jypyter Notebook.
Cloud Datalab is adding a new network of its own. AI Notebooks remains within an existing network. With the current setup of my environment, I do not want to add overhead of maintaining extra network and security to watch over, and so AI Notebooks is the immediate solution. But I would also want to understand the benefits that Cloud Datalab provides.
Between AI Notebook and Cloud Datalab, which should be used and in which
scenario?
Is Cloud Datalab also providing pre-installed packages of Python,
Tensorflow or R environment like AI Notebooks?

Between AI Notebook and Cloud Datalab, which should be used and in
which scenario?
You should use AI notebooks on new projects in any case since Cloud Datalab would be deprecated sooner than later.
Is Cloud Datalab also providing pre-installed packages of Python,
Tensorflow or R environment like AI Notebooks?
Yes it does.
Summary of the differences between the two products.
DataLab
Custom UI that is not compatible with latest JupyterLab extensions.
Using old PyDatalab SDK since when DataLab was released there were no official SDK available for many of GCP services.
No major changes on RoadMap.
Requires SSH with port mapping to use
Notebooks:
Using JupyterLab UI.
Using official SDKs (like BigQuery Python SDK), therefore better integration.
Since UI (JupyterLab) is community driven releasing new changes rapidly.
Access to UI is simple, no SSH, no CLI usage is required.
Notebooks API
Terraform support
Client libraries (Python, Java, NodeJS) to manage Notebooks

Related

Triggering a training task on cloud ml when file arrives to cloud storage

I am trying to build an app where the user is able to upload a file to cloud storage. This would then trigger a model training process (and predicting later on). Initially I though I could do this with cloud functions/pubsub and cloudml, but it seems that cloud functions are not able to trigger gsutil commands which is needed for cloudml.
Is my only option to enable cloud-composer and attach GPUs to a kubernetes node and create a cloud function that triggers a dag to boot up a pod on the node with GPUs and mounting the bucket with the data? Seems a bit excessive but I can't think of another way currently.
You're correct. As for now, there's no possibility to execute gsutil command from a Google Cloud Function:
Cloud Functions can be written in Node.js, Python, Go, and Java, and are executed in language-specific runtimes.
I really like your second approach with triggering the DAG.
Another idea that comes to my mind is to interact with GCP Virtual Machines within Cloud Composer through the Python operator by using the Compute Engine Pyhton API. You can find more information in automating infrastructure and taking a deep technical dive into the core features of Cloud Composer here.
Another solution that you can think of is Kubeflow, which aims to make running ML workloads on Kubernetes. Kubeflow adds some resources to your cluster to assist with a variety of tasks, including training and serving models and running Jupyter Notebooks. Please, have a look on Codelabs tutorial.
I hope you find the above pieces of information useful.

Converting a regular compute engine VM instances to a AI Platform Notebook instance

Is there anyway I convert/add Jupyter Lab to an existing VM. This VM was created under Compute -> Compute Engine -> VM Instances. When I go to AI Platforms -> Notebooks, I do not see this instance, so I'm assuming it is not setup to use Jupyter Lab. However, the settings on this VM should be more than sufficient to run Jupyter Lab, so I was hoping to add this functionality. Thank you.
The purpose of AI Platform Notebooks is to be a managed service for Jupyter Notebook. Please have a look at the documentation AI Platform Notebooks:
Managed JupyterLab notebook instances
AI Platform Notebooks is a managed service that offers an integrated
and secure JupyterLab environment for data scientists and machine
learning developers to experiment, develop, and deploy models into
production. Users can create instances running JupyterLab that come
pre-installed with the latest data science and machine learning
frameworks in a single click.
Also, if you check managed VM you'll found the description of the image:
Google, Deep Learning Image: Container Base, m50, A Debian based image with Docker and NVIDIA-Docker support for custom containers with Deep Learning Image integration.
and this image is different from the image you have on your VM.
As result, unfortunately, you're not able to use your existing VM for such purposes.
You can try to file a feature request at Google Issue Tracker if you really want to be able to do it.

Where to keep the Dataflow and Cloud composer python code?

It probably is a silly question. In my project we'll be using Dataflow and Cloud composer. For that I had asked permission to create a VM instance in the GCP project to keep the both the Dataflow and Cloud composer python program. But the client asked me the reason of creation of a VM instance and told me that you can execute the Dataflow without the VM instance.
Is that possible? If yes how to achieve it? Can anyone please explain it? It'll be really helpful to me.
You can run Dataflow pipelines or manage Composer environments in you own computer once your credentials are authenticated and you have both the Google SDK and Dataflow Python library installed. However, this depends on how you want to manage your resources. I prefer to use a VM instance to have all the resources I use in the cloud where it is easier to set up VPC networks including different services. Also, saving data from a VM instance into GCS buckets is usually faster than from an on-premise computer/server.

How do you schedule GCP AI Platform notebooks via Google Cloud Composer?

I've been tasked with automating the scheduling of some notebooks that are run daily that are on AI Platform notebooks via the Papermill operator, but actually doing this through Cloud Composer is giving me some troubles.
Any help is appreciated!
First step is to create Jupyter Lab Notebook. If you want to use additional libraries, install them and restart the kernel (Restart Kernel and Clear All Outputs option). Then, define the processing inside your Notebook.
When it's ready, remove all the runs, peeks and dry runs before you start the scheduling phase.
Now, you need to set up Cloud Composer environment (remember about installing additional packages, that you defined in first step). To schedule workflow, go to Jupyter Lab and create second notebook which generates DAG from workflow.
The final step is to upload the zipped workflow to the Cloud Composer DAGs folder. You can manage your workflow using Airflow UI.
I recommend you to take a look for this article.
Another solution that you can use is Kubeflow, which aims to make running ML workloads on Kubernetes. Kubeflow adds some resources to your cluster to assist with a variety of tasks, including training and serving models and running Jupyter Notebooks. You can find interesting tutorial on codelabs.
I hope you find the above pieces of information useful.
This blog post on Medium, "How to Deploy and Schedule Jupyter Notebook on Google Cloud Platform", describes how to run Jupyter notebook jobs on a Compute Engine Instance and schedule it using GCP's Cloud Scheduler > Cloud Pub/Sub > Cloud Functions. (Unfortunately the post may be paywalled.)
If you must use Cloud Composer, then you might find this answer to related question, "ETL in Airflow aided by Jupyter Notebooks and Papermill," useful.

What is the difference between google cloud datalab and google cloud ai platform notebooks?

I'm looking into the best way to set up an end-to-end machine learning pipeline, and evaluating the data exploration component options.
I'm trying to figure out the difference between google cloud datalab, and google cloud ai platform notebooks. They both seem to offer similar functionality, so not sure why they both exist, or whether one is a new iteration of the other.
If they are different, what is the benefit of one over the other?
Google Cloud AI Platform Notebooks is effectively the upgraded version of Google Cloud Datalab and gives you benefits like being able to use the notebook directly in your browser without having to setup an ssh tunnel first (which datalab forces you to do).
If you're creating a new notebook, you 100% want to go with AI Platform Notebooks
(Also, Datalab is now deprecated)
Quoting the End-to-End Machine Learning with TensorFlow on GCP course on Coursera (link) -
AI Platform Notebooks is the next generation of hosted notebook on GCP, and has replaced Cloud Datalab.