Google Cloud Platform design for a stateful application - google-cloud-platform

Usecase: Our requirement is to run a service continuously every few minutes. This service reads a value from datastore, and hits a public url using that value from datastore (Stateful). This service doesnt have Front End. No body would be accessing this service publicly. A new value is stored in datastore as a result of response from the url. Exactly one server is required to run.
We are in need to decide one of the below for our use case.
Compute Engine (IaaS -> we dont want to maintain the infra for this simple stateful application)
Kubernetes Engine (still feeling overkill )
App Engine : PaaS-> App Engine is usually used for Mobile apps, Gaming, Websites. App Engine provides a url with web address. Is it right choice for our usecase? If we choose app engine, is it possible to stop the public app engine url? Also, as one instance would be running continuously in app engine, what is cost effective - standard or flexible?
Cloud Functions -> Event Driven(looks not suitable for our application)
Google Cloud Scheduler-> We thought we could use cloud scheduler + cloud functions. But during outage, jobs are queued up. In our case, after outage, only one server/instance/job could be up and running.
Thanks!

after outage, only one server/instance/job could be up and running
Limiting Cloud Function concurrency is enough? If so, you can do this:
gcloud functions deploy FUNCTION_NAME --max-instances 1 FLAGS...
https://cloud.google.com/functions/docs/max-instances
I also recommend taking a look at Google Cloud Run, is a serverless docker platform, it can be limited to a maximum of 1 instances responding to a maximum of 1 request concurrently. It would require Cloud Scheduler too, making regular HTTP requests to it.
With both services configured with max concurrency of 1, only one server/instance/job will be up and running, but, after outages, jobs may be scheduled as soon as another finish. If this is problematic, adding a lastRun datetime field on datastore job row and not running if it's too recent, or disable retry of cloud scheduler, like said here:
Google Cloud Tasks HTTP trigger - how to disable retry

Related

Can you call a Cloud Run app from inside a Cloud Function?

I'd like to call a Cloud Run app from inside a Cloud Function multiple times, given some logic. I've googled this quite a lot and don't find good solutions. Is this supported?
I've seen the Workflows Tutorials, but AFAIK they are meant to pass messages in series between different GPC services. My Cloud Function runs on a schedule every minute and it would only need to call the Cloud Run app a few times per day given some event. I've thought about having the entire app run in Cloud Run instead of the Cloud function. However, I think having it all in Cloud Run would be more expensive than running the Cloud function.
I went through your question, I have an alternative in my mind if you agree to the solution. You can use Cloud Scheduler to securely trigger a Cloud Run service asynchronously on a schedule.
You need to create a service account to associate with Cloud
Scheduler, and give that service account the permission to invoke
your Cloud Run service, i.e. Cloud Run invoker (You can use an
existing service account to represent Cloud Scheduler, or you can
create a new one for that matter)
Next, you have to create a Cloud Scheduler job that invokes your
service at specified times. Specify the frequency, or job interval,
at which the job is to run, using a configuration string. Specify the
fully qualified URL of your Cloud Run service, for example
https://myservice-abcdef-uc.a.run.app The job will send requests to
this URL.
Next, specify the HTTP method: the method must match what your
previously deployed Cloud Run service is expecting. When you deploy
the service using Cloud Scheduler, make sure you do not allow
unauthenticated invocations. Please go through this
documentation for details and try to implement the steps.
Back to your question, yes it's possible to call your Cloud Run service from inside Cloud Functions. Here, your Cloud Run service calls from another backend service i.e. Cloud Functions directly( synchronously) over HTTP, using its endpoint URL. For this use case, you should make sure that each service is only able to make requests to specific services.
Go through this documentation suggested by #John Hanley as it provides you with the steps you need to follow.

GCP components to orchestrate crons running in GCE (Google Workflows?)

I need to run a pipeline of data transformation that is composed of several scripts in distinct projects = Python repos.
I am thinking of using Compute Engine to run these scripts in VMs when needed as I can manage resources required.
I need to be able to orchestrate these scripts in the sense that I want to run steps sequentially and sometimes asyncronously.
I see that GCP provides us with a Worflows components which seems to suit this case.
I am thinking of creating a specific project to orchestrate the executions of scripts.
However I cannot see how I can trigger the execution of my scripts which will not be in the same repo as the orchestrator project. From what I understand of GCE, VMs are only created when scripts are executed and provide no persistent HTTP endpoints to be called to trigger the execution from elsewhere.
To illustrate, let say I have two projects step_1 and step_2 which contain separate steps of my data transformation pipeline.
I would also have a project orchestrator with the only use of triggering step_1 and step_2 sequentially in VMs with GCE. This project would not have access to the code repos of these two former projects.
What would be the best practice in this case? Should I use other components than GCE and Worflows for this or there is a way to trigger scripts in GCE from an independent orchestration project?
One possible solution would be to not use GCE (Google Compute Engines) but instead create Docker containers that contain your task steps. These would then be registered with Cloud Run. Cloud Run spins up docker containers on demand and charges you only for the time you spend processing a request. When the request ends, you are no longer charged and hence you are optimally consuming resources. Various events can cause a request in Cloud Run but the most common is a REST call. With this in mind, now assume that your Python code is now packaged in a container which is triggered by a REST server (eg. Flask). Effectively you have created "microservices". These services can then be orchestrated by Cloud Workflows. The invocation of these microservices is through REST endpoints which can be Internet addresses with authorization also present. This would allow the microservices (tasks/steps) to be located in separate GCP projects and the orchestrator would see them as distinct callable endpoints.
Other potentials solutions to look at would be GKE (Kubernetes) and Cloud Composer (Apache Airflow).
If you DO wish to stay with Compute Engines, you can still do that using shared VPC. Shared VPC would allow distinct projects to have network connectivity between each other and you could use Private Catalog to have the GCE instances advertize to each other. You could then have a GCE instance choreograph or, again, choreograph through Cloud Workflows. We would have to check that Cloud Workflows supports parallel items ... I do not believe that as of the time of this post it does.
This is a common request, to organize automation into it's own project. You can setup service account that spans multiple projects.
See a tutorial here: https://gtseres.medium.com/using-service-accounts-across-projects-in-gcp-cf9473fef8f0
On top of that, you can also think to have Workflows in both orchestrator and sublevel project. This way the orchestrator Workflow can call another Workflow. So the job can be easily run, and encapsuled also under the project that has the code + workflow body, and only the triggering comes from other project.

Can cloud functions like AWS Lambdas or Google Cloud Function access databases from other servers?

I have a webapp and database that aren't hosted on any cloud service, just on a regular hosting platform.
I need to build an API to read and write to that database and I want to use cloud functions to do so. Is it possible to connect to a remote databases from cloud functions (such as AWS Lambdas or Google cloud functions) even when they're not hosted that cloud service?
If so, can there be problems with doing so?
Cloud Functions are just Node.js code that runs in a managed environment. This means your code can do almost anything that Node.js scripts can do, as long as you stay within the restrictions of that environment.
I've seen people connect to many other database services, both within Google Cloud Platform and outside of it. The main restriction to be aware of there, is that you'll need to be on a paid plan in order to be able to call APIs that are not running on Google Cloud Platform.
Yes it's possible.
If so, can there be problems with doing so?
There could high latency if the database is in a different network. Also, long-lived database connection pools don't really work well in these environments due to the nature of the functions being created and destroyed constantly. Also, if your function reaches a high level of concurrency you may exhaust the number of available connections on your database server.
You could use FaaS the same as your web service hosted on any web server or cloud server.
You have to be careful with the duration of your call to DB because FasS functions are limited in time (15 min for AWS Lambda and 9 min on Google) and configure a firewall properly on your DB server.
A container of your lambda function could be reused, you could use some tricks with it - Best Practices for AWS Lambda Container Reuse
But you can't be sure that nothing changed during the work of your service.
You could read some good advice about it there - https://stackoverflow.com/a/37524237/182344
PS: Azure functions have always on setting, but I am not sure how pooling will work in this case.
Yes you can access on premise resources from Serverless products.
Please check this detailed tutorial where you can find 3 methods to achive your goal link:
Connecting using a VPN
Connecting using a Parner interconnect
Connecting using Interconnect solution

How to use Google Cloud Task outside of App Engine?

I am building a python app in Google cloud. This involves delayed execution of tasks.
It seems, Cloud tasks are limited to App Engine.
Can we use cloud tasks from GCE VMs or containers running in GCP/other clouds VMs?
Even google docs have only for push queues with app engine.
Does cloud tasks support pull queues?
[EDIT]
I tried looking at their cloud discovery files. v2beta1 has pull references but v2 does not. I believe GCP don't want to support this in future :-(.
Cloud Tasks does not support pull queues, but just launched a Beta feature for HTTP Targets which allows Cloud Tasks to push tasks to any HTTP endpoint. There's even functionality for Cloud Tasks to include an authentication token based on an associated service account: https://cloud.google.com/tasks/docs/creating-http-target-tasks
This would allow you to push to GCE, or really any service that can operate as a webhook. If you were to use the new Cloud Run Beta product, verifying these tokens is handled for you.
Cloud Pub/Sub provides support for pull-based processing.

How can I run background worker process in Google Cloud?

As per title, how can I run background worker process in Google Cloud like Heroku worker dynos?
I read the Google Cloud documentation and the articles seem to assume I always want to deploy a web application. I don't want a web application at all. And then there are other documentation on Cloud Pub/Sub, Task Queues, Cloud Tasks, Cloud Functions, Cron etc. which seems to be just different types of event triggered one-off routines.
What I want is just a worker process that does stuff and update the database, and can gracefully shutdown when requested like SIGTERM in Heroku.
Short answer: A container on Google Kubernetes Engine.
All your mentioned GCP solutions requires triggering, either from HTTP-requests, events, tasks or time, in order to run your code.
If you just want to have a job running in the background, you can create a container that runs a single never-ending process (e.g. java, node, etc) and deploy it to GKE (check out DaemonSet and StatefulSet)
Alternative solution: Google Compute Engine.