I'm currently working on a project that I want to move into the cloud so I can scale to multiple users. Basically, I want to grab the name for every user from my database (thinking about using Firestore for this), and for each of those names I want to call a python script every 24 hours (will use a Cloud Scheduler job). I envision that I would use Cloud Run, since I would want multiple instances of the python script running at the same time (one for every user) and I would want it to be serverless since I would only be running it once a day.
My questions are:
Are the cloud services I chose the correct ones for the job? For instance, is there a better service than Cloud Run to launch the script?
Is there built-in functionality to pass in entries in Firestore and launch a Cloud Run instance for each? I see, for instance, that issue Does it make sense to run a non web application on cloud run? is very close to my question, but they do not interact with the database element
Related
I'd like to call a Cloud Run app from inside a Cloud Function multiple times, given some logic. I've googled this quite a lot and don't find good solutions. Is this supported?
I've seen the Workflows Tutorials, but AFAIK they are meant to pass messages in series between different GPC services. My Cloud Function runs on a schedule every minute and it would only need to call the Cloud Run app a few times per day given some event. I've thought about having the entire app run in Cloud Run instead of the Cloud function. However, I think having it all in Cloud Run would be more expensive than running the Cloud function.
I went through your question, I have an alternative in my mind if you agree to the solution. You can use Cloud Scheduler to securely trigger a Cloud Run service asynchronously on a schedule.
You need to create a service account to associate with Cloud
Scheduler, and give that service account the permission to invoke
your Cloud Run service, i.e. Cloud Run invoker (You can use an
existing service account to represent Cloud Scheduler, or you can
create a new one for that matter)
Next, you have to create a Cloud Scheduler job that invokes your
service at specified times. Specify the frequency, or job interval,
at which the job is to run, using a configuration string. Specify the
fully qualified URL of your Cloud Run service, for example
https://myservice-abcdef-uc.a.run.app The job will send requests to
this URL.
Next, specify the HTTP method: the method must match what your
previously deployed Cloud Run service is expecting. When you deploy
the service using Cloud Scheduler, make sure you do not allow
unauthenticated invocations. Please go through this
documentation for details and try to implement the steps.
Back to your question, yes it's possible to call your Cloud Run service from inside Cloud Functions. Here, your Cloud Run service calls from another backend service i.e. Cloud Functions directly( synchronously) over HTTP, using its endpoint URL. For this use case, you should make sure that each service is only able to make requests to specific services.
Go through this documentation suggested by #John Hanley as it provides you with the steps you need to follow.
I need to run a pipeline of data transformation that is composed of several scripts in distinct projects = Python repos.
I am thinking of using Compute Engine to run these scripts in VMs when needed as I can manage resources required.
I need to be able to orchestrate these scripts in the sense that I want to run steps sequentially and sometimes asyncronously.
I see that GCP provides us with a Worflows components which seems to suit this case.
I am thinking of creating a specific project to orchestrate the executions of scripts.
However I cannot see how I can trigger the execution of my scripts which will not be in the same repo as the orchestrator project. From what I understand of GCE, VMs are only created when scripts are executed and provide no persistent HTTP endpoints to be called to trigger the execution from elsewhere.
To illustrate, let say I have two projects step_1 and step_2 which contain separate steps of my data transformation pipeline.
I would also have a project orchestrator with the only use of triggering step_1 and step_2 sequentially in VMs with GCE. This project would not have access to the code repos of these two former projects.
What would be the best practice in this case? Should I use other components than GCE and Worflows for this or there is a way to trigger scripts in GCE from an independent orchestration project?
One possible solution would be to not use GCE (Google Compute Engines) but instead create Docker containers that contain your task steps. These would then be registered with Cloud Run. Cloud Run spins up docker containers on demand and charges you only for the time you spend processing a request. When the request ends, you are no longer charged and hence you are optimally consuming resources. Various events can cause a request in Cloud Run but the most common is a REST call. With this in mind, now assume that your Python code is now packaged in a container which is triggered by a REST server (eg. Flask). Effectively you have created "microservices". These services can then be orchestrated by Cloud Workflows. The invocation of these microservices is through REST endpoints which can be Internet addresses with authorization also present. This would allow the microservices (tasks/steps) to be located in separate GCP projects and the orchestrator would see them as distinct callable endpoints.
Other potentials solutions to look at would be GKE (Kubernetes) and Cloud Composer (Apache Airflow).
If you DO wish to stay with Compute Engines, you can still do that using shared VPC. Shared VPC would allow distinct projects to have network connectivity between each other and you could use Private Catalog to have the GCE instances advertize to each other. You could then have a GCE instance choreograph or, again, choreograph through Cloud Workflows. We would have to check that Cloud Workflows supports parallel items ... I do not believe that as of the time of this post it does.
This is a common request, to organize automation into it's own project. You can setup service account that spans multiple projects.
See a tutorial here: https://gtseres.medium.com/using-service-accounts-across-projects-in-gcp-cf9473fef8f0
On top of that, you can also think to have Workflows in both orchestrator and sublevel project. This way the orchestrator Workflow can call another Workflow. So the job can be easily run, and encapsuled also under the project that has the code + workflow body, and only the triggering comes from other project.
I have a business requirement in which I need a microservice which takes a list of CSV files and use those files to update the Database to which it is connected to. This happens once a month and there is no end point and running of service is not required. The app starts and does the job of creating DB using some CSV files and it's done.
Can I use AWS lambda for it? I already have a spring boot project created which does this job. But we want to minimise the cost and instead of running service in EC2 which is not required because the app only needs to start once a month. I need the best way to do my job with minimum cost.
Ps- DB will also reside in AWS
Usecase: Our requirement is to run a service continuously every few minutes. This service reads a value from datastore, and hits a public url using that value from datastore (Stateful). This service doesnt have Front End. No body would be accessing this service publicly. A new value is stored in datastore as a result of response from the url. Exactly one server is required to run.
We are in need to decide one of the below for our use case.
Compute Engine (IaaS -> we dont want to maintain the infra for this simple stateful application)
Kubernetes Engine (still feeling overkill )
App Engine : PaaS-> App Engine is usually used for Mobile apps, Gaming, Websites. App Engine provides a url with web address. Is it right choice for our usecase? If we choose app engine, is it possible to stop the public app engine url? Also, as one instance would be running continuously in app engine, what is cost effective - standard or flexible?
Cloud Functions -> Event Driven(looks not suitable for our application)
Google Cloud Scheduler-> We thought we could use cloud scheduler + cloud functions. But during outage, jobs are queued up. In our case, after outage, only one server/instance/job could be up and running.
Thanks!
after outage, only one server/instance/job could be up and running
Limiting Cloud Function concurrency is enough? If so, you can do this:
gcloud functions deploy FUNCTION_NAME --max-instances 1 FLAGS...
https://cloud.google.com/functions/docs/max-instances
I also recommend taking a look at Google Cloud Run, is a serverless docker platform, it can be limited to a maximum of 1 instances responding to a maximum of 1 request concurrently. It would require Cloud Scheduler too, making regular HTTP requests to it.
With both services configured with max concurrency of 1, only one server/instance/job will be up and running, but, after outages, jobs may be scheduled as soon as another finish. If this is problematic, adding a lastRun datetime field on datastore job row and not running if it's too recent, or disable retry of cloud scheduler, like said here:
Google Cloud Tasks HTTP trigger - how to disable retry
1. Summarize the problem
Google Cloud Run advertises that it is "stateless containers". Is there a way to run anything at all, have it save state somewhere?
I want to run Postgres in a container, but only have it up on demand, spin up the PG container when there is a request made.
The same question goes for a container that will hold a REST API (web server), to connect to the PG container.
So when the web app (hosted on Firebase), makes a request to the REST API (container), it would spin up, and then the PG instance that gets queried from the REST api would spin up (or can simply put both DB , REST API in one container).
For a dev instance, I don't want something up 24x7x365 doing mostly nothing, just something that will spin up during development hours, but have a number of these, am the only OPS guy, want to automate it for developers, including myself and minimize billing.
Any best approach here would be appreciated.
2. Provide background including what you've already tried
I have created Docker containers and deployed to Cloud Run
3. Show some code
yum install buildah podman -y
4. Describe expected and actual results including any error messages
I am looking for a solution to minimize billing for a dev environment that will include hosting and a database/REST API (database has to be Postgres).
I'm looking for a stateful cloud run that will maintain the state of a database.
Cloud Run is not suitable for hosting a database. Server instances allocated for incoming requests to Cloud Run can come and go, and not all requests will go to the same instance, which means that not all clients will see the same data. That's the problem with "stateless containers".
If you want to use Cloud Run to provide database access, it would best be as a proxy to some other cloud-hosted database service. You might use to it host a REST API endpoint that accesses some other database service (for example: Cloud Firestore, Cloud SQL). But it doesn't make sense to host the database itself in your docker image, since those server instances can come and go unpredictably, destroying any database state stored in each instance.