I am building a python app in Google cloud. This involves delayed execution of tasks.
It seems, Cloud tasks are limited to App Engine.
Can we use cloud tasks from GCE VMs or containers running in GCP/other clouds VMs?
Even google docs have only for push queues with app engine.
Does cloud tasks support pull queues?
[EDIT]
I tried looking at their cloud discovery files. v2beta1 has pull references but v2 does not. I believe GCP don't want to support this in future :-(.
Cloud Tasks does not support pull queues, but just launched a Beta feature for HTTP Targets which allows Cloud Tasks to push tasks to any HTTP endpoint. There's even functionality for Cloud Tasks to include an authentication token based on an associated service account: https://cloud.google.com/tasks/docs/creating-http-target-tasks
This would allow you to push to GCE, or really any service that can operate as a webhook. If you were to use the new Cloud Run Beta product, verifying these tokens is handled for you.
Cloud Pub/Sub provides support for pull-based processing.
Related
They both seem to be recommended CI/CD tools within Google Cloud.. but with similar functionality. Would I use one over the other? Maybe together?
Cloud Build seems to be the de facto tool. While Cloud Deploy says that it can do "pipeline and promotion management."
Both of them are designed as serverless, meaning you don't have to manage the underlying infrastructure of your builds and defining delivery pipelines in a YAML configuration file. However, Cloud Deploy needs a configuration for Skaffold, which Google Cloud Deploy needs in order to perform render and deploy operations.
And according to this documentation,
Google Cloud Deploy is a service that automates delivery of your applications to a series of target environments in a defined sequence.
Cloud Deploy is an opinionated, continuous delivery system currently supporting Kubernetes clusters and Anthos. It picks up after the CI process has completed (i.e. the artifact/images are built) and is responsible for delivering the software to production via a progression sequence defined in a delivery pipeline.
While Google Cloud Build is a service that executes your builds on Google Cloud.
Cloud Build (GCB) is Google's cloud Continuous Integration/Continuous Development (CICD) solution. And takes users code stored in Cloud Source Repositories, GitHub, Bitbucket, or other solutions; builds it; runs tests; and saves the results to an artifact repository like Google Container Registry, Artifactory, or a Google Cloud Storage bucket. Also, supports complex builds with multiple steps, for example, testing and deployments. If you want to add your CI pipeline, it's as easy as adding an additional step to it. Take your Artifacts, either built or stored locally or at your destination and easily deploy it to many services with a deployment strategy of you choice.
Provide more details in order to choose between the two services and it will still depend on your use case. However, their objectives might help to make it easier for you to choose between the two services.
Cloud Build's mission is to help GCP users build better software
faster, more securely by providing a CI/CD workflow automation product for
developer teams and other GCP services.
Cloud Deploy's mission is to make it easier to set up and run continuous
software delivery to a Google Kubernetes Engine environment.
In addtion, refer to this documentation for price information, Cloud Build pricing and Cloud Deploy pricing.
I need to run a pipeline of data transformation that is composed of several scripts in distinct projects = Python repos.
I am thinking of using Compute Engine to run these scripts in VMs when needed as I can manage resources required.
I need to be able to orchestrate these scripts in the sense that I want to run steps sequentially and sometimes asyncronously.
I see that GCP provides us with a Worflows components which seems to suit this case.
I am thinking of creating a specific project to orchestrate the executions of scripts.
However I cannot see how I can trigger the execution of my scripts which will not be in the same repo as the orchestrator project. From what I understand of GCE, VMs are only created when scripts are executed and provide no persistent HTTP endpoints to be called to trigger the execution from elsewhere.
To illustrate, let say I have two projects step_1 and step_2 which contain separate steps of my data transformation pipeline.
I would also have a project orchestrator with the only use of triggering step_1 and step_2 sequentially in VMs with GCE. This project would not have access to the code repos of these two former projects.
What would be the best practice in this case? Should I use other components than GCE and Worflows for this or there is a way to trigger scripts in GCE from an independent orchestration project?
One possible solution would be to not use GCE (Google Compute Engines) but instead create Docker containers that contain your task steps. These would then be registered with Cloud Run. Cloud Run spins up docker containers on demand and charges you only for the time you spend processing a request. When the request ends, you are no longer charged and hence you are optimally consuming resources. Various events can cause a request in Cloud Run but the most common is a REST call. With this in mind, now assume that your Python code is now packaged in a container which is triggered by a REST server (eg. Flask). Effectively you have created "microservices". These services can then be orchestrated by Cloud Workflows. The invocation of these microservices is through REST endpoints which can be Internet addresses with authorization also present. This would allow the microservices (tasks/steps) to be located in separate GCP projects and the orchestrator would see them as distinct callable endpoints.
Other potentials solutions to look at would be GKE (Kubernetes) and Cloud Composer (Apache Airflow).
If you DO wish to stay with Compute Engines, you can still do that using shared VPC. Shared VPC would allow distinct projects to have network connectivity between each other and you could use Private Catalog to have the GCE instances advertize to each other. You could then have a GCE instance choreograph or, again, choreograph through Cloud Workflows. We would have to check that Cloud Workflows supports parallel items ... I do not believe that as of the time of this post it does.
This is a common request, to organize automation into it's own project. You can setup service account that spans multiple projects.
See a tutorial here: https://gtseres.medium.com/using-service-accounts-across-projects-in-gcp-cf9473fef8f0
On top of that, you can also think to have Workflows in both orchestrator and sublevel project. This way the orchestrator Workflow can call another Workflow. So the job can be easily run, and encapsuled also under the project that has the code + workflow body, and only the triggering comes from other project.
I'm working on a Cloud Run docker application that handles a few long-running data integration processes.
I'm struggling to come up with a way to locally run/test my submissions to Cloud Tasks before actually deploying the container to Cloud Run.
Is there any way to do this?
A local emulator for Cloud Tasks is not available yet, in some cases you can substitute Cloud Tasks with Pub/Sub.
Also, consider to use non Google solutions such as Cloud-Tasks-In-Process-Emulator, gcloud-tasks-emulator 0.5.1 or Cloud tasks emulator.
As I can understand you want to test the cloud task locally! Yes it is possible by using ngrok. By using ngrok you can access your local application on public and for cloud task you need the public url for handling task.
Usecase: Our requirement is to run a service continuously every few minutes. This service reads a value from datastore, and hits a public url using that value from datastore (Stateful). This service doesnt have Front End. No body would be accessing this service publicly. A new value is stored in datastore as a result of response from the url. Exactly one server is required to run.
We are in need to decide one of the below for our use case.
Compute Engine (IaaS -> we dont want to maintain the infra for this simple stateful application)
Kubernetes Engine (still feeling overkill )
App Engine : PaaS-> App Engine is usually used for Mobile apps, Gaming, Websites. App Engine provides a url with web address. Is it right choice for our usecase? If we choose app engine, is it possible to stop the public app engine url? Also, as one instance would be running continuously in app engine, what is cost effective - standard or flexible?
Cloud Functions -> Event Driven(looks not suitable for our application)
Google Cloud Scheduler-> We thought we could use cloud scheduler + cloud functions. But during outage, jobs are queued up. In our case, after outage, only one server/instance/job could be up and running.
Thanks!
after outage, only one server/instance/job could be up and running
Limiting Cloud Function concurrency is enough? If so, you can do this:
gcloud functions deploy FUNCTION_NAME --max-instances 1 FLAGS...
https://cloud.google.com/functions/docs/max-instances
I also recommend taking a look at Google Cloud Run, is a serverless docker platform, it can be limited to a maximum of 1 instances responding to a maximum of 1 request concurrently. It would require Cloud Scheduler too, making regular HTTP requests to it.
With both services configured with max concurrency of 1, only one server/instance/job will be up and running, but, after outages, jobs may be scheduled as soon as another finish. If this is problematic, adding a lastRun datetime field on datastore job row and not running if it's too recent, or disable retry of cloud scheduler, like said here:
Google Cloud Tasks HTTP trigger - how to disable retry
I am evaluating stackdriver from GCP for logging across multiple micro services.
Some of these services are deployed on premise and some of them are on AWS/GCP.
Our services are either .NET or nodejs based apps and we are invested in winston for nodejs and nlog in .net.
I was looking # integrating our on-premise nodejs application with stackdriver logging. Looking # https://cloud.google.com/logging/docs/setup/nodejs the documentation it seems that there we need to install the agent for any machine other than the google compute instances. Is this correct?
if we need to install the agent then is there any way where I can test the logging during my development? The development environment is either a windows 10/mac.
There's a new option for ingesting logs (and metrics) with Stackdriver as most of the non-google environment agents look like they are being deprecated. https://cloud.google.com/stackdriver/docs/deprecations/third-party-apps
A Google post on logging on-prem resources with stackdriver and Blue Medora
https://cloud.google.com/solutions/logging-on-premises-resources-with-stackdriver-and-blue-medora
for logs you still need to install an agent on each box to collect the logs, it's a BindPlane agent not a Google agent.
For node.js, you can use the #google-cloud/logging-winston and #google-cloud/logging-bunyan modules from anywhere (on-prem, AWS, GCP, etc.). You will need to provide projectId and auth credentials manually if not running on GCP. Instructions on how to set these up is available in the linked pages.
When running on GCP we figure out the exact environment (App Engine, Compute Engine, etc.) automatically and the logs should up under those resources in the Logging UI. If you are going to use the modules from your development machines, we will report the logs against the 'global' resource by default. You can customize this by passing a specific resource descriptor yourself.
Let us know if you run into any trouble.
I tried setting this up on my local k8s cluster. By following this: https://kubernetes.io/docs/tasks/debug-application-cluster/logging-stackdriver/
But i couldnt get it to work, the fluentd-gcp-v2.0-qhqzt keeps crashing.
Also, the page mentions that there are multiple issues with stackdriver logging if you DONT use it on google GKE. See the screenshot.
I think google is trying to lock you in into GKE.