Is there any way to disable google cloud functions versioning?
I've for a long time tried to limit the number of versions kept in the cloud functions history, or if impossible, disable it completely...
This is something that at low level any infrastructure manager will let you do but google intentionally doesn't
When using Firebase Cloud Function, There's a Lifecycle of a background function. As stated from the documentation:
When you update the function by deploying updated code, instances for older versions are cleaned up along with build artifacts in Cloud Storage and Container Registry, and replaced by new instances.
When you delete the function, all instances and zip archives are cleaned up, along with related build artifacts in Cloud Storage and Container Registry. The connection between the function and the event provider is removed.
There is no need to manually clean or remove the previous versions as Firebase deploy scripts are doing it automatically.
Based on the Cloud Functions Execution Environment:
Cloud Functions run in a fully-managed, serverless environment where
Google handles infrastructure, operating systems, and runtime
environments completely on your behalf. Each Cloud Function runs in
its own isolated secure execution context, scales automatically, and
has a lifecycle independent from other functions.
These means that you should not remove build artifacts since cloud functions are scaling automatically and new instances are built from these artifacts.
Related
They both seem to be recommended CI/CD tools within Google Cloud.. but with similar functionality. Would I use one over the other? Maybe together?
Cloud Build seems to be the de facto tool. While Cloud Deploy says that it can do "pipeline and promotion management."
Both of them are designed as serverless, meaning you don't have to manage the underlying infrastructure of your builds and defining delivery pipelines in a YAML configuration file. However, Cloud Deploy needs a configuration for Skaffold, which Google Cloud Deploy needs in order to perform render and deploy operations.
And according to this documentation,
Google Cloud Deploy is a service that automates delivery of your applications to a series of target environments in a defined sequence.
Cloud Deploy is an opinionated, continuous delivery system currently supporting Kubernetes clusters and Anthos. It picks up after the CI process has completed (i.e. the artifact/images are built) and is responsible for delivering the software to production via a progression sequence defined in a delivery pipeline.
While Google Cloud Build is a service that executes your builds on Google Cloud.
Cloud Build (GCB) is Google's cloud Continuous Integration/Continuous Development (CICD) solution. And takes users code stored in Cloud Source Repositories, GitHub, Bitbucket, or other solutions; builds it; runs tests; and saves the results to an artifact repository like Google Container Registry, Artifactory, or a Google Cloud Storage bucket. Also, supports complex builds with multiple steps, for example, testing and deployments. If you want to add your CI pipeline, it's as easy as adding an additional step to it. Take your Artifacts, either built or stored locally or at your destination and easily deploy it to many services with a deployment strategy of you choice.
Provide more details in order to choose between the two services and it will still depend on your use case. However, their objectives might help to make it easier for you to choose between the two services.
Cloud Build's mission is to help GCP users build better software
faster, more securely by providing a CI/CD workflow automation product for
developer teams and other GCP services.
Cloud Deploy's mission is to make it easier to set up and run continuous
software delivery to a Google Kubernetes Engine environment.
In addtion, refer to this documentation for price information, Cloud Build pricing and Cloud Deploy pricing.
I need to run a pipeline of data transformation that is composed of several scripts in distinct projects = Python repos.
I am thinking of using Compute Engine to run these scripts in VMs when needed as I can manage resources required.
I need to be able to orchestrate these scripts in the sense that I want to run steps sequentially and sometimes asyncronously.
I see that GCP provides us with a Worflows components which seems to suit this case.
I am thinking of creating a specific project to orchestrate the executions of scripts.
However I cannot see how I can trigger the execution of my scripts which will not be in the same repo as the orchestrator project. From what I understand of GCE, VMs are only created when scripts are executed and provide no persistent HTTP endpoints to be called to trigger the execution from elsewhere.
To illustrate, let say I have two projects step_1 and step_2 which contain separate steps of my data transformation pipeline.
I would also have a project orchestrator with the only use of triggering step_1 and step_2 sequentially in VMs with GCE. This project would not have access to the code repos of these two former projects.
What would be the best practice in this case? Should I use other components than GCE and Worflows for this or there is a way to trigger scripts in GCE from an independent orchestration project?
One possible solution would be to not use GCE (Google Compute Engines) but instead create Docker containers that contain your task steps. These would then be registered with Cloud Run. Cloud Run spins up docker containers on demand and charges you only for the time you spend processing a request. When the request ends, you are no longer charged and hence you are optimally consuming resources. Various events can cause a request in Cloud Run but the most common is a REST call. With this in mind, now assume that your Python code is now packaged in a container which is triggered by a REST server (eg. Flask). Effectively you have created "microservices". These services can then be orchestrated by Cloud Workflows. The invocation of these microservices is through REST endpoints which can be Internet addresses with authorization also present. This would allow the microservices (tasks/steps) to be located in separate GCP projects and the orchestrator would see them as distinct callable endpoints.
Other potentials solutions to look at would be GKE (Kubernetes) and Cloud Composer (Apache Airflow).
If you DO wish to stay with Compute Engines, you can still do that using shared VPC. Shared VPC would allow distinct projects to have network connectivity between each other and you could use Private Catalog to have the GCE instances advertize to each other. You could then have a GCE instance choreograph or, again, choreograph through Cloud Workflows. We would have to check that Cloud Workflows supports parallel items ... I do not believe that as of the time of this post it does.
This is a common request, to organize automation into it's own project. You can setup service account that spans multiple projects.
See a tutorial here: https://gtseres.medium.com/using-service-accounts-across-projects-in-gcp-cf9473fef8f0
On top of that, you can also think to have Workflows in both orchestrator and sublevel project. This way the orchestrator Workflow can call another Workflow. So the job can be easily run, and encapsuled also under the project that has the code + workflow body, and only the triggering comes from other project.
I understand that using AWS Lambda allows us to submit single functions to a runtime and have them execute when needed. But what about the software these functions depend on? Where do these get installed? Does the installation and configuration happen every time the lambda instance gets spun up? Wouldn't this take a while for larger applications/detailed configurations?
Or does the installed software sit on the server (say on an EC2 instance) and then simply gets called upon as needed by the lambda functions?
There are essentially two ways to manage dependencies of a Lambda function.
Using lambda layers: A Lambda layer is an archive containing additional code, such as libraries, dependencies, or even custom runtimes. When you include a layer in a function, the contents are extracted to the /opt directory in the execution environment. You can include up to five layers per function, which count towards the standard Lambda deployment size limits. Have a look at this article for more details.
Using container images: You can package your code and dependencies as a container image using tools such as the Docker command line interface (CLI). You can then upload the image to your container registry hosted on Amazon Elastic Container Registry (Amazon ECR). See the official docs here.
Because Lambda can scale to zero, it suffers from a so-called cold star issues. This means that unless there is a warm, running container instance available, Lambda has to "cold start" a new container causing some delay, especially for large footprint applications stacks such as JVM based.
Best, Stefan
We built an API and integrated cloud function as backend. Till now we were deploying cloud function first and API gateway later. Is there a best way to club these two services and deploy it as a whole?
It's 2 different products and no, you can't tie them and deploy in the same time.
Cloud Functions build a container based your code and can take more or less time according to the number of dependencies and the type of language (required compilation (Java or Go) or not).
API Gateway requires to deploy a new config and to deploy it on a gateway. And it takes a while to achieve these.
So, no link between the product and not the same deployment duration. The right pattern here is to use versioning. You can deploy a service before the others (Cloud Functions before API Gateway) for minor change (doesn't break the existing behavior).
For breaking change, I recommend you to not update the existing functions but to create a new one. The advantage is to have the capacity to continue to have the 2 versions in parallel, and a rapid rollback in case of issue. Same thing for API Gateway, create a new gateway for a new version.
Google Cloud needs enabled API before many things are possible to be done.
Enabling needs just one CLI command, and usually is very fast. Enabling is even proposed by CLI if I try to do something which requires not-enabled API. But it anyway interrupts development.
My question is why they are not enabled by default? And is it ok if I enable them all just after creating new project to don't bother about enabling them later?
I would like to understand purpose of such design and learn best practices.
Well, they're disabled mainly in order not to incurr costs that you weren't intending on inducing, for you to be aware which service you're using at which point and to track the usage/costs for each of them.
Also, some services like Pub/Sub are dependent on others, and others such as Container Registry (or Artifact Registry), require a Cloud Storage bucket for artifacts to be stored, and it will create a one automatically if you're pushing a Docker image or using Cloud Build. So these are things for you to be aware of.
Enabling an API takes a bit of time depending on the service, yes, but it's a one-time action per project. I'm not sure what exactly your concerns on the waiting time are, but if you want to run commands while having executed a gcloud command to enable some APIs you can use the --async flag which will run the commands in the background without needing you to wait for it to complete before running another one.
Lastly, sure, you can just enable them all if you know what you're doing but at your own risk - it's a safer route to enable just the ones you need and as you might already be aware, you can enable multiple in a single gcloud command. In the example of Container Registry, it uses Cloud Storage, for which you will still be billed on.
Enabling services enables access to (often billed) resources.
It's considered good practice to keep this "surface" of resources constrained to those that you(r customers) need; the more services you enable, the greater your potential attack surface and potential bills.
Google provides an increasing number of services (accessible through APIs). It is highly unlikely that you would ever want to access them all.
APIs are enabled by Project. The Project creation phase (including enabling services) is generally only a very small slice of the entire lifetime of a Project; even of those Projects created-and-torn-down on demand.
It's possible to enable the APIs asynchronously, permitting you to enable-not-block each service:
for SERVICE in "containerregistry" "container" "cloudbuild" ...
do
gcloud services enable ${SERVICE}.googleapis.com --project=${PROJECT} --async
done
Following on from this, it is good practice to automate your organization's project provisioning (scripts, Terraform, Deployment Manager etc.). This provides a baseline template for how your projects are created, which services are enabled, default permissions etc. Then your developers simply fire-and-forget a provisioner (hopefully also checked-in to your source control), drink a coffee and wait these steps are done for them.