How can I run background worker process in Google Cloud? - google-cloud-platform

As per title, how can I run background worker process in Google Cloud like Heroku worker dynos?
I read the Google Cloud documentation and the articles seem to assume I always want to deploy a web application. I don't want a web application at all. And then there are other documentation on Cloud Pub/Sub, Task Queues, Cloud Tasks, Cloud Functions, Cron etc. which seems to be just different types of event triggered one-off routines.
What I want is just a worker process that does stuff and update the database, and can gracefully shutdown when requested like SIGTERM in Heroku.

Short answer: A container on Google Kubernetes Engine.
All your mentioned GCP solutions requires triggering, either from HTTP-requests, events, tasks or time, in order to run your code.
If you just want to have a job running in the background, you can create a container that runs a single never-ending process (e.g. java, node, etc) and deploy it to GKE (check out DaemonSet and StatefulSet)
Alternative solution: Google Compute Engine.

Related

How can i make Springboot application start daily and shutdownn after completing process in GCP

i will be using helm charts to deploy my application in GCP.i want to make my springboot shutdown after completing its process and then on next day start on its own at a particular time -complete the process and shut down .Is this possible in GCP?
You can use Cloud Run Jobs or Batches to complete that and by paying ONLY when you process consume ressources. (Not the case with Kubernetes and GKE where you pay for your cluster (and your node pools) even if nothing runs on it)

Is it possible to simulate a Google Cloud Task locally?

I'm working on a Cloud Run docker application that handles a few long-running data integration processes.
I'm struggling to come up with a way to locally run/test my submissions to Cloud Tasks before actually deploying the container to Cloud Run.
Is there any way to do this?
A local emulator for Cloud Tasks is not available yet, in some cases you can substitute Cloud Tasks with Pub/Sub.
Also, consider to use non Google solutions such as Cloud-Tasks-In-Process-Emulator, gcloud-tasks-emulator 0.5.1 or Cloud tasks emulator.
As I can understand you want to test the cloud task locally! Yes it is possible by using ngrok. By using ngrok you can access your local application on public and for cloud task you need the public url for handling task.

Google Cloud Platform design for a stateful application

Usecase: Our requirement is to run a service continuously every few minutes. This service reads a value from datastore, and hits a public url using that value from datastore (Stateful). This service doesnt have Front End. No body would be accessing this service publicly. A new value is stored in datastore as a result of response from the url. Exactly one server is required to run.
We are in need to decide one of the below for our use case.
Compute Engine (IaaS -> we dont want to maintain the infra for this simple stateful application)
Kubernetes Engine (still feeling overkill )
App Engine : PaaS-> App Engine is usually used for Mobile apps, Gaming, Websites. App Engine provides a url with web address. Is it right choice for our usecase? If we choose app engine, is it possible to stop the public app engine url? Also, as one instance would be running continuously in app engine, what is cost effective - standard or flexible?
Cloud Functions -> Event Driven(looks not suitable for our application)
Google Cloud Scheduler-> We thought we could use cloud scheduler + cloud functions. But during outage, jobs are queued up. In our case, after outage, only one server/instance/job could be up and running.
Thanks!
after outage, only one server/instance/job could be up and running
Limiting Cloud Function concurrency is enough? If so, you can do this:
gcloud functions deploy FUNCTION_NAME --max-instances 1 FLAGS...
https://cloud.google.com/functions/docs/max-instances
I also recommend taking a look at Google Cloud Run, is a serverless docker platform, it can be limited to a maximum of 1 instances responding to a maximum of 1 request concurrently. It would require Cloud Scheduler too, making regular HTTP requests to it.
With both services configured with max concurrency of 1, only one server/instance/job will be up and running, but, after outages, jobs may be scheduled as soon as another finish. If this is problematic, adding a lastRun datetime field on datastore job row and not running if it's too recent, or disable retry of cloud scheduler, like said here:
Google Cloud Tasks HTTP trigger - how to disable retry

How to set up a long running Django command in Google Cloud Platform

I have recently moved my site to Google Cloud Run.
The problem is I also need to move a couple of cron jobs that run a Django command every day inside a container. What is the preferred way of doing this if I don't want to pay for a full Kubernetes cluster with always running node instances?
I would like the task to run and then spin the server down, just as Cloud Run does when I get an incoming request. I have searched through all the documentation, but I am having trouble in finding the correct solution for long running tasks inside containers that do not require an underlying server in Google Cloud.
Can someone point me in the right direction?
Cloud Run request timeout limit is 15 minutes.
Cloud Functions function timeout limit is 540 seconds.
For long-running tasks spinning up and down Compute Instance when needed would be more preferred option.
An example of how to schedule, run and stop Compute Instances automatically is nicely explained here:
Scheduling compute instances with Cloud Scheduler
In brief: Actual instance start / stop is performed by Cloud Functions. Cloud Scheduler on timetable publishes required tasks to Cloud Pub/Sub queue which triggers these functions. Your code at the end of main logic can also publish message to Cloud Pub/Sub to run Stop this instance task.
How to process task in Django?
it can be same django app started with wsgi server to process incoming requests (like regular django site) but wth increased request / response / other timeouts, long wsgi worker life ... - in this case task is regular http request to django view
it can be just one script (or django management command) run at cloud instance startup to just automatically execute one task
you may also want to pass additional arguments for the task, in this case you can publish to Cloud Pub/Sub one Start instance task, and one main logic task with custom arguments and make your code pull from Pub/Sub first
more django-native - use Celery and start celery worker as separate Compute Instance
One possisble option of how to use just one Celery worker without all other parts (i.e. broker (there is no official built-in Cloud Pub/Sub support)) and pull/push tasks to/from Cloud Pub/Sub:
run celery worker with dummy filesystem broker
add target method as #periodic_task to run i.e. every 30 seconds
at the start of the task - subscribe to Cloud Pub/Sub queue, check for new task, receive one and start processing
at the and of the task - publish to Cloud Pub/Sub results and a call to Stop this instance
There is also Cloud Tasks (timeout limit: with auto-startup - 10 minutes, manual startup - 24 hours) as a Cloud Run addition for asynchronous tasks, but in this case Cloud Pub/Sub is more suitable.

Using AWS SQS for Job Queuing but minimizing "workers" uptime

I am designing my first Amazon AWS project and I could use some help with the queue processing.
This service accepts processing jobs, either via an ASP.net Web API service or a GUI web site (which just calls the API). Each job has one or more files associated with it and some rules about the type of job. I want to queue each job as it comes in, presumably using AWS SQS. The jobs will then be processed by a "worker" which is a python script with a .Net wrapper. The python script is an existing batch processor that cannot be altered/customized for AWS, hence the wrapper in .Net that manages the AWS portions and passing in the correct params to python.
The issue is that we will not have a huge number of jobs, but each job is somewhat compute intensive. One of the reasons to go to AWS was to minimize infrastructure costs. I plan on having the frontend web site (Web API + ASP.net MVC4 site) run on elastic beanstalk. But I would prefer not to have a dedicated worker machine always online polling for jobs, since these workers need to be a bit "beefier" instance (for processing) and it would cost us a lot to mostly sit doing nothing.
Is there a way to only run the web portion on beanstalk and then have the worker process only spin up if there are items in the queue? I realize I could have a micro "controller" instance always online polling and then have it control the compute spinup, but even that seems like it shouldn't be needed. Can EC2 instances be started based on a non-zero SQS queue size? So basically web api adds job to queue, something watches the queue and sees it's non-zero, this triggers the EC2 worker to start, it spins up and polls the queue on startup. It processes until the queue until empty, then something triggers it to shutdown.
You can use Autoscaling in conjunction with SQS to dynamically start and stop EC2 instances. There is a AWS blog post that describes the architecture you are thinking of.