How I can trigger a Cloud Run Job or Cloud Function after a Cloud Run Job finishes all its tasks? - google-cloud-platform

I have a Cloud Scheduler that triggers a Cloud Run Job to run and parallelize the payload into 3 tasks using the CLOUD_RUN_TASK_INDEX environment variable.
I would like to trigger another Cloud Run Job or Cloud Function after the previous Cloud Run Job finishes all the 3 tasks. I looked up in the Google documentation, but I could not find any reference in order to accomplish this. How can this be done?

In your case, the most efficient (IMO) is to use Cloud Workflows with the Cloud Task connectors. The connector run the task and wait for completion. And then you can continue the workflow with different tasks that you want to execute.
Cloud Workflows also support parallel execution
Another idea could be to use Eventarc to catch the audit logs that mention the end of the job processing. But I prefer the first solution.

Related

Triggering an alert when multiple dataflow jobs run in parallel in GCP

I am using google cloud dataflow to execute some resource intensive dataflow jobs. And at a given time, my system must execute no more than 2 jobs in parallel.
Since each job is quite resource intensive, I am looking for a way to trigger an alert when more than 2 dataflow jobs are running.
I tried implementing a custom_count which increments after the start of each job. But custom_couter only display after the job has executed. And it might be too late to trigger an alert by then.
You could modify the quota dataflow.googleapis.com/job_count of the project to be limited to 1, and no two jobs could run parallel in that project. The quota is at the project level, it would not affect other projects.
Another option is to use an GCP monitoring system that is observing the running Dataflow jobs. You can e.g. use Elastic Cloud (available via Marketplace) to load all relevant Metrics and Logs. Elastic can visualize and alert on every state you are interested in.
I found this terraform project very helpful in order to get started with that approach.

How to set up a long running Django command in Google Cloud Platform

I have recently moved my site to Google Cloud Run.
The problem is I also need to move a couple of cron jobs that run a Django command every day inside a container. What is the preferred way of doing this if I don't want to pay for a full Kubernetes cluster with always running node instances?
I would like the task to run and then spin the server down, just as Cloud Run does when I get an incoming request. I have searched through all the documentation, but I am having trouble in finding the correct solution for long running tasks inside containers that do not require an underlying server in Google Cloud.
Can someone point me in the right direction?
Cloud Run request timeout limit is 15 minutes.
Cloud Functions function timeout limit is 540 seconds.
For long-running tasks spinning up and down Compute Instance when needed would be more preferred option.
An example of how to schedule, run and stop Compute Instances automatically is nicely explained here:
Scheduling compute instances with Cloud Scheduler
In brief: Actual instance start / stop is performed by Cloud Functions. Cloud Scheduler on timetable publishes required tasks to Cloud Pub/Sub queue which triggers these functions. Your code at the end of main logic can also publish message to Cloud Pub/Sub to run Stop this instance task.
How to process task in Django?
it can be same django app started with wsgi server to process incoming requests (like regular django site) but wth increased request / response / other timeouts, long wsgi worker life ... - in this case task is regular http request to django view
it can be just one script (or django management command) run at cloud instance startup to just automatically execute one task
you may also want to pass additional arguments for the task, in this case you can publish to Cloud Pub/Sub one Start instance task, and one main logic task with custom arguments and make your code pull from Pub/Sub first
more django-native - use Celery and start celery worker as separate Compute Instance
One possisble option of how to use just one Celery worker without all other parts (i.e. broker (there is no official built-in Cloud Pub/Sub support)) and pull/push tasks to/from Cloud Pub/Sub:
run celery worker with dummy filesystem broker
add target method as #periodic_task to run i.e. every 30 seconds
at the start of the task - subscribe to Cloud Pub/Sub queue, check for new task, receive one and start processing
at the and of the task - publish to Cloud Pub/Sub results and a call to Stop this instance
There is also Cloud Tasks (timeout limit: with auto-startup - 10 minutes, manual startup - 24 hours) as a Cloud Run addition for asynchronous tasks, but in this case Cloud Pub/Sub is more suitable.

Scheduling cron jobs on Google Cloud DataProc

I currently have a PySpark job that is deployed on a DataProc cluster (1 master & 4 worker nodes with sufficient cores and memory). This job runs on millions of records and performs an expensive computation (Point in Polygon). I am able to successfully run this job by itself. However, I want to schedule the job to be run on the 7th of every month.
What I am looking for is the most efficient way to set up cron jobs on a DataProc Cluster. I tried to read up on Cloud Scheduler, but it doesn't exactly explain how it can be used in conjunction with a DataProc cluster. It would be really helpful to see either an example of cron job on DataProc or some documentation on DataProc exclusively working together with Scheduler.
Thanks in advance!
For scheduled Dataproc interactions (create cluster, submit job, wait for job, delete cluster while also handling errors) Dataproc's Workflow Templates API is a better choice than trying to orchestrate these yourself. A key advantage is Workflows are fire-and-forget and any clusters created will also be deleted on completion.
If your Workflow Template is relatively simple such that it's parameters do not change between invocations a simpler way to schedule would be to use Cloud Scheduler. Cloud Functions are a good choice if you need to run a workflow in response to files in GCS or events in PubSub. Finally, Cloud Composer is great if your workflow parameters are dynamic or there's other GCP products in the mix.
Assuming your use cases is the simple run workflow every so often with the same parameters, I'll demonstrate using Cloud Scheduler:
I created a workflow in my project called terasort-example.
I then created a new Service Account in my project, called workflow-starter#example.iam.gserviceaccount.com and gave it Dataproc Editor role; however something more restricted with just dataproc.workflows.instantiate is also sufficient.
After enabling the the Cloud Scheduler API, I headed over to Cloud Scheduler in Developers Console. I created a job as follows:
Target: HTTP
URL: https://dataproc.googleapis.com/v1/projects/example/regions/global/workflowTemplates/terasort-example:instantiate?alt=json
HTTP Method: POST
Body: {}
Auth Header: OAuth Token
Service Account: workflow-starter#example.iam.gserviceaccount.com
Scope: (left blank)
You can test it by clicking Run Now.
Note you can also copy the entire workflow content in the Body as JSON payload. The last part of the URL would become workflowTemplates:instantiateInline?alt=json
Check out this official doc that discusses other scheduling options.
Please see the other answer for more comprehensive solution
What you will have to do is publish an event to pubsub topic from Cloud Scheduler and then have a Cloud Function react to that event.
Here's a complete example of using Cloud Function to trigger Dataproc:
How can I run create Dataproc cluster, run job, delete cluster from Cloud Function

How do I run a serverless batch job in Google Cloud

I have a batch job that takes a couple of hours to run. How can I run this in a serverless way on Google Cloud?
AppEngine, Cloud Functions, and Cloud Run are limited to 10-15 minutes. I don't want to rewrite my code in Apache Beam.
Is there an equivalent to AWS Batch on Google Cloud?
Note: Cloud Run and Cloud Functions can now last up to 60 minutes. The answer below remains a viable approach if you have a multi-hour job.
Vertex AI Training is serverless and long-lived. Wrap your batch processing code in a Docker container, push to gcr.io and then do:
gcloud ai custom-jobs create \
--region=LOCATION \
--display-name=JOB_NAME \
--worker-pool-spec=machine-type=MACHINE_TYPE,replica-count=REPLICA_COUNT,executor-image-uri=EXECUTOR_IMAGE_URI,local-package-path=WORKING_DIRECTORY,script=SCRIPT_PATH
You can run any arbitrary Docker container — it doesn’t have to be a machine learning job. For details, see:
https://cloud.google.com/vertex-ai/docs/training/create-custom-job#create_custom_job-gcloud
Today you can also use Cloud Batch: https://cloud.google.com/batch/docs/get-started#create-basic-job
Google Cloud does not offer a comparable product to AWS Batch (see https://cloud.google.com/docs/compare/aws/#service_comparisons).
Instead you'll need to use Cloud Tasks or Pub/Sub to delegate the work to another product, such as Compute Engine, but this lacks the ability to do this in a "serverless" way.
Finally Google released (in Beta for the moment) Cloud Batch which does exactly what you want.
You push jobs (containers or scripts) and it runs. Simple as that.
https://cloud.google.com/batch/docs/get-started
This answer to a How to make GCE instance stop when its deployed container finishes? will work for you as well:
In short:
First dockerize your batch process.
Then, create an instance:
Using a container-optmized image
And using a Startup script that pulls your docker image, runs it, and shutdown the machine at the end.
I have faced the same problem. in my case I went for:
Cloud Scheduler to start the job by pushing to Pub/Sub.
Pub/Sub triggers Cloud Functions.
Cloud Functions mounting a Compute Engine instance.
Compute Engine runs the batch workload and auto kills the instance
once it’s done. You can read by post on medium:
https://link.medium.com/1K3NsElGYZ
It might help you get started. There's also a follow up post showing how to use a Docker container inside the Compute Engine instance: https://medium.com/google-cloud/serverless-batch-workload-on-gcp-adding-docker-and-container-registry-to-the-mix-558f925e1de1
You can use Cloud Run. At the time of writing this, the timeout of Cloud Run (fully managed) is increased to 60 minutes, but in beta.
https://cloud.google.com/run/docs/configuring/request-timeout
Important: Although Cloud Run (fully managed) has a maximum timeout of 60 minutes, only timeouts of 15 minutes or less are generally available: setting timeouts greater than 15 minutes is a Beta feature.
Another alternative for batch computing is using Google Cloud Lifesciences.
An example application using Cloud Life Sciences is dsub.
Or see the Cloud Life Sciences Quickstart documentation.
I found myself looking for a solution to this problem and built something similar to what mesmacosta has described in a different answer, in the form of a reusable tool called gcp-runbatch.
If you can package your workload into a Docker image then you can run it using gcp-runbatch. When triggered, it will do the following:
Create a new VM
On VM startup, docker run the specified image
When the docker run exits, the VM will be deleted
Some features that are supported:
Invoke batch workload from the command line, or deploy as a Cloud Function and invoke that way (e.g. to trigger batch workloads via Cloud Scheduler)
stdout and stderr will be piped to Cloud Logging
Environment variables can be specified by the invoker, or pulled from Secret Manager
Here's an example command line invocation:
$ gcp-runbatch \
--project-id=long-octane-350517 \
--zone=us-central1-a \
--service-account=1234567890-compute#developer.gserviceaccount.com \
hello-world
Successfully started instance runbatch-38408320. To tail batch logs run:
CLOUDSDK_PYTHON_SITEPACKAGES=1 gcloud beta --project=long-octane-350517
logging tail 'logName="projects/long-octane-350517/logs/runbatch" AND
resource.labels.instance_id="runbatch-38408320"' --format='get(text_payload)'
GCP launched their new "Batch" service in July '22. It basically Compute Engine packaged with some utilities to easily productionize a batch job -- including defining required resources, executables (script or container-based), and define a run schedule.
Haven't used it yet, but seems like a great fit for batch jobs that take over 1 hr.

How to delete files from cloud storage after dataflow job completes

In GCP, I have a dataflow job that does the job of copying files from cloud storage to big query.I would like to delete these files once they are successfully inserted into big query. Can someone provide pointers on how to achieve this and also how to trigger another job after the previous one has succeeded?
For these types of scenarios, its normally recommended that you introduce a tool for scheduling and workload orchestration into your architecture. Google Cloud provides Cloud Composer, a managed version of Airflow, to solve exactly this use case. You can schedule a DAG (directed-acyclic graph) in Composer to start your Dataflow job and then, on the success of a job run, execute additional tasks for file cleanup or to kick off the next process.
Example DAG
To get started I recommend checking out the Cloud Composer documentation as well as these Cloud Composer Examples which seem similar to your use case.