What is the difference between Cloud Build and Cloud Deploy? - google-cloud-platform

They both seem to be recommended CI/CD tools within Google Cloud.. but with similar functionality. Would I use one over the other? Maybe together?
Cloud Build seems to be the de facto tool. While Cloud Deploy says that it can do "pipeline and promotion management."

Both of them are designed as serverless, meaning you don't have to manage the underlying infrastructure of your builds and defining delivery pipelines in a YAML configuration file. However, Cloud Deploy needs a configuration for Skaffold, which Google Cloud Deploy needs in order to perform render and deploy operations.
And according to this documentation,
Google Cloud Deploy is a service that automates delivery of your applications to a series of target environments in a defined sequence.
Cloud Deploy is an opinionated, continuous delivery system currently supporting Kubernetes clusters and Anthos. It picks up after the CI process has completed (i.e. the artifact/images are built) and is responsible for delivering the software to production via a progression sequence defined in a delivery pipeline.
While Google Cloud Build is a service that executes your builds on Google Cloud.
Cloud Build (GCB) is Google's cloud Continuous Integration/Continuous Development (CICD) solution. And takes users code stored in Cloud Source Repositories, GitHub, Bitbucket, or other solutions; builds it; runs tests; and saves the results to an artifact repository like Google Container Registry, Artifactory, or a Google Cloud Storage bucket. Also, supports complex builds with multiple steps, for example, testing and deployments. If you want to add your CI pipeline, it's as easy as adding an additional step to it. Take your Artifacts, either built or stored locally or at your destination and easily deploy it to many services with a deployment strategy of you choice.
Provide more details in order to choose between the two services and it will still depend on your use case. However, their objectives might help to make it easier for you to choose between the two services.
Cloud Build's mission is to help GCP users build better software
faster, more securely by providing a CI/CD workflow automation product for
developer teams and other GCP services.
Cloud Deploy's mission is to make it easier to set up and run continuous
software delivery to a Google Kubernetes Engine environment.
In addtion, refer to this documentation for price information, Cloud Build pricing and Cloud Deploy pricing.

Related

GCP components to orchestrate crons running in GCE (Google Workflows?)

I need to run a pipeline of data transformation that is composed of several scripts in distinct projects = Python repos.
I am thinking of using Compute Engine to run these scripts in VMs when needed as I can manage resources required.
I need to be able to orchestrate these scripts in the sense that I want to run steps sequentially and sometimes asyncronously.
I see that GCP provides us with a Worflows components which seems to suit this case.
I am thinking of creating a specific project to orchestrate the executions of scripts.
However I cannot see how I can trigger the execution of my scripts which will not be in the same repo as the orchestrator project. From what I understand of GCE, VMs are only created when scripts are executed and provide no persistent HTTP endpoints to be called to trigger the execution from elsewhere.
To illustrate, let say I have two projects step_1 and step_2 which contain separate steps of my data transformation pipeline.
I would also have a project orchestrator with the only use of triggering step_1 and step_2 sequentially in VMs with GCE. This project would not have access to the code repos of these two former projects.
What would be the best practice in this case? Should I use other components than GCE and Worflows for this or there is a way to trigger scripts in GCE from an independent orchestration project?
One possible solution would be to not use GCE (Google Compute Engines) but instead create Docker containers that contain your task steps. These would then be registered with Cloud Run. Cloud Run spins up docker containers on demand and charges you only for the time you spend processing a request. When the request ends, you are no longer charged and hence you are optimally consuming resources. Various events can cause a request in Cloud Run but the most common is a REST call. With this in mind, now assume that your Python code is now packaged in a container which is triggered by a REST server (eg. Flask). Effectively you have created "microservices". These services can then be orchestrated by Cloud Workflows. The invocation of these microservices is through REST endpoints which can be Internet addresses with authorization also present. This would allow the microservices (tasks/steps) to be located in separate GCP projects and the orchestrator would see them as distinct callable endpoints.
Other potentials solutions to look at would be GKE (Kubernetes) and Cloud Composer (Apache Airflow).
If you DO wish to stay with Compute Engines, you can still do that using shared VPC. Shared VPC would allow distinct projects to have network connectivity between each other and you could use Private Catalog to have the GCE instances advertize to each other. You could then have a GCE instance choreograph or, again, choreograph through Cloud Workflows. We would have to check that Cloud Workflows supports parallel items ... I do not believe that as of the time of this post it does.
This is a common request, to organize automation into it's own project. You can setup service account that spans multiple projects.
See a tutorial here: https://gtseres.medium.com/using-service-accounts-across-projects-in-gcp-cf9473fef8f0
On top of that, you can also think to have Workflows in both orchestrator and sublevel project. This way the orchestrator Workflow can call another Workflow. So the job can be easily run, and encapsuled also under the project that has the code + workflow body, and only the triggering comes from other project.

What is the difference between GCP cloud composer and workflow?

The cloud workflow doesn't come with a scheduling feature. Apart from that, what are all the differences between these two services in terms of features? In which use case should we prefer the workflow over composer or vice versa?
There are some key differences to consider when choosing between the two solutions :
A Composer instance needs to be in a running state to trigger DAGs and you'll also need to size your Cloud Composer instance based on your usage, You do not need to do this in Cloud Workflows as it is a Serverless service and you pay for anytime a workflow is triggered
Another key difference is that Cloud Composer is really convenient for writing and orchestrating data pipelines because of it's internal scheduler and also because of the provided Operators, You can interact with any Data services inside of GCP.
However, Cloud Workflows interacts with Cloud Functions, wich is a task that Composer cannot do really well.
Both Composer and Workflows support orchestrating multiple services and can handle long running workflows. Despite there being some overlap in the capabilities of these products, each has differentiators that make them well suited to particular use cases.
Composer is most commonly used for orchestrating the transformation of data as part of ELT or data engineering. Workflows, in contrast, is focused on the orchestration of HTTP-based services built with Cloud Functions, Cloud Run, or external APIs.
Composer is designed for orchestrating batch workloads that can handle a delay of a few seconds between task executions. It wouldn’t be suitable if low latency was required in between tasks, whereas Workflows is designed for latency sensitive use cases.
While you don’t have to worry about maintaining Airflow deployments in Composer, you do need to specify how many workers you need for a given Composer environment. Workflows is completely serverless; there is no infrastructure to manage or scale.
For further information refer to this google blog article and this one.

Triggering a training task on cloud ml when file arrives to cloud storage

I am trying to build an app where the user is able to upload a file to cloud storage. This would then trigger a model training process (and predicting later on). Initially I though I could do this with cloud functions/pubsub and cloudml, but it seems that cloud functions are not able to trigger gsutil commands which is needed for cloudml.
Is my only option to enable cloud-composer and attach GPUs to a kubernetes node and create a cloud function that triggers a dag to boot up a pod on the node with GPUs and mounting the bucket with the data? Seems a bit excessive but I can't think of another way currently.
You're correct. As for now, there's no possibility to execute gsutil command from a Google Cloud Function:
Cloud Functions can be written in Node.js, Python, Go, and Java, and are executed in language-specific runtimes.
I really like your second approach with triggering the DAG.
Another idea that comes to my mind is to interact with GCP Virtual Machines within Cloud Composer through the Python operator by using the Compute Engine Pyhton API. You can find more information in automating infrastructure and taking a deep technical dive into the core features of Cloud Composer here.
Another solution that you can think of is Kubeflow, which aims to make running ML workloads on Kubernetes. Kubeflow adds some resources to your cluster to assist with a variety of tasks, including training and serving models and running Jupyter Notebooks. Please, have a look on Codelabs tutorial.
I hope you find the above pieces of information useful.

What is the difference between GCP Kubeflow and GCP cloud composer?

I am learning GCP, and came across Kuberflow and Google Cloud Composer.
From what I have understood, it seems that both are used to orchestrate workflows, empowering the user to schedule and monitor pipelines in the GCP.
The only difference that I could figure out is that Kuberflow deploys and monitors Machine Learning models. Am I correct? In that case, since Machine Learning models are also objects, can't we orchestrate them using Cloud Composer? How does Kubeflow help in any way, better than Cloud Composer when it comes to managing Machine Learning models??
Thanks
Kubeflow and Kubeflow Pipelines
Kubeflow is not exactly the same as Kubeflow Pipelines. The Kubeflow project mostly develops Kubernetes operators for distributed ML training (TFJob, PyTorchJob). On the other hand the Pipelines project develops a system for authoring and running pipelines on Kubernetes. KFP also has some sample components, by the main product is the pipeline authoring SDK and the pipeline execution engine
Kubeflow Pipelines vs. Cloud Composer
The projects are pretty similar, but there are differences:
KFP use Argo for execution and orchestration. Cloud Composer uses Apache Airflow.
KFP/Argo is designed for distributed execution on Kubernetes. Cloud Composer/Apache Airflow are more for single-machine execution.
KFP/Argo are language-agnostic - components can use any language (components describe containerized command-line programs). Cloud Composer/Apache Airflow use Python (Airflow operators are defined as Python classes).
KFP/Argo have concept of data passing. Every component has inputs and outputs and pipleine connects them into a data passing graph. Cloud Composer/Apache Airflow do not really have data passing (Airflow has global variable storage and XCom, but it's not the same thing as explicit data passing) and the pipeline is a task dependency graph rather than mostly data dependency graph (KFP can also have task dependencies, but usually they're not needed).
KFP supports execution caching feature that skips execution of tasks that have already been executed before.
KFP records all artifacts produced by pipeline runs in ML Metadata database.
KFP has experimental adapter which allows using Airflow operators as components.
KFP has large fast-growing ecosystem of custom components.
Kubeflow is a platform for developing and deploying a machine learning (ML) systems. Its components are focused on creating workflows aimed to build ML systems.
Cloud Composer provides the infraestructure to run Apache Airflow worflows. Its components are known as Airflow Operators and the workflows are connections between these operators that are known as DAGs.
Both services run on Kubernetes, but they are based on different programming frameworks; therefore, you are correct, Kuberflow deploys and monitors Machine Learning models. See below the answer for your questions:
In that case, since Machine Learning models are also objects, can't we orchestrate them using Cloud Composer?
You would need to find an operator that meet your needs, or create a custom operator with the structure required to create a model, see this example. Even when it can be performed, this could be more difficult that using Kubeflow.
How does Kubeflow help in any way, better than Cloud Composer when it comes to managing Machine Learning models??
Kubeflow hides complexity as it is focused on Machine Learninig models. The frameworks specialized on machine learning makes those things easier than using Cloud Composer which in this context can be considered as a general purpose tool (focused on linking existing services supported by the Airflow Operators).
Taking this straight from kubeflow.org
The Kubeflow project is dedicated to making deployments of machine
learning (ML) workflows on Kubernetes simple, portable and scalable.
Our goal is not to recreate other services, but to provide a
straightforward way to deploy best-of-breed open-source systems for ML
to diverse infrastructures. Anywhere you are running Kubernetes, you
should be able to run Kubeflow.
And as you can see it is a suite made of many software that are useful in the life cycle of a ML model. It comes with tensorflow, jupiter, etc.
Now the real deal, when it comes to Kubeflow is "easy deploy of a ML model at scale on a Kubernetis cluster".
However on GCP you already a ML suite in cloud, datalab, cloud build etc. So I don't know how much efficient will be sinning up a kubernetis cluster if you don't need the "portability" factor.
Cloud Composer is the real deal while taking about orchestration of a workflow. It is a "managed" version of Apache Airflow and it is ideal for any "simple" workflow that changes a lot, since you can change it via a visual UI and with python.
It is also ideal to automate infrastructure operations:

What are the deployment tools used in hybrid cloud

I want to setup hybrid cloud and choose OpenStack as private cloud and AWS,RackSpace OpenStack as public clouds. what tools I need to choose for the automatic provisioning and configuration of cloud resources.
thanks
There are several tools that could help you with provisioning, management, deployment and post deployment support such as Rightscale, Scalr and Xervmon. Xervmon is one tool you should give it a try. It supports multi cloud provisioning, management, deployment and monitoring of these wide array of assets. XOPS - Xervmon Operation Center is an integrated tool that leverages puppet classes to provision and automate routine tasks across multi cloud providers.
Xervmon takes an integrated approach to the whole gamut of cloud management, monitoring across hybrid infrastructure. Xervmon