How can i make Springboot application start daily and shutdownn after completing process in GCP - amazon-web-services

i will be using helm charts to deploy my application in GCP.i want to make my springboot shutdown after completing its process and then on next day start on its own at a particular time -complete the process and shut down .Is this possible in GCP?

You can use Cloud Run Jobs or Batches to complete that and by paying ONLY when you process consume ressources. (Not the case with Kubernetes and GKE where you pay for your cluster (and your node pools) even if nothing runs on it)

Related

Cost of running small kubernetes cronjob?

I am currently learning kubernetes and would like to run a cronjob every 6 hours (job is running under a minute). Minikube is not suitable as I cannot ensure my laptop stay alive 24h/7d... I wonder what is the cost on main kubernetes providers (GCP, AWS, Azure) for this type of workload? Is it better to rent a VM and install a small kubernetes instance to do so?
Thanks
Getting former user feedback will be helpful.
You can have a look to Cloud Run and Cloud Run jobs that allow you to run container in serverless mode.
In addition, you can also have a look to GKE Autopilot where you pay only when you consume resource on the cluster (and the first cluster is free).

How can I run background worker process in Google Cloud?

As per title, how can I run background worker process in Google Cloud like Heroku worker dynos?
I read the Google Cloud documentation and the articles seem to assume I always want to deploy a web application. I don't want a web application at all. And then there are other documentation on Cloud Pub/Sub, Task Queues, Cloud Tasks, Cloud Functions, Cron etc. which seems to be just different types of event triggered one-off routines.
What I want is just a worker process that does stuff and update the database, and can gracefully shutdown when requested like SIGTERM in Heroku.
Short answer: A container on Google Kubernetes Engine.
All your mentioned GCP solutions requires triggering, either from HTTP-requests, events, tasks or time, in order to run your code.
If you just want to have a job running in the background, you can create a container that runs a single never-ending process (e.g. java, node, etc) and deploy it to GKE (check out DaemonSet and StatefulSet)
Alternative solution: Google Compute Engine.

How to run resource intensive tasks with Airflow

We have a long running (3h) model training task which runs every 3 days and smaller prediction pipelines that run daily.
For both cases we use Jenkins + EC2 plugin to spin up large instances(workers) and run pipelines on them. This serves 2 purposes:
Keep pipelines isolated. So every pipeline has all resources of one instance.
We save costs. Large instance run only for several hours and not 24/7
With Jenkins + EC2 plugin I am not responsible for copying code to worker and reporting the result of the execution back. Jenkins does it under the hood.
Are there anyways to achieve the same behaviour with Airflow?
Airflow 1.10 released a host of new AWS integrations that gives you a few options for doing something like this on AWS.
https://airflow.apache.org/integration.html#aws-amazon-web-services
If you are running your task in a containerized setting, it sounds like the ECSOperator or the KubernetesPodOperator could be what you need (if you're using Kubernetes).

Schedule Docker image to be run periodically on AWS ECS?

How do I schedule a docker image to be run periodically (hourly) using ECS and without having to use a continually running EC2 instance + cron? I have a docker image containing third party binaries and the python project.
The latter approach is not viable long-term as it's expensive for the instance to be running 24/7, while only being used for a small fraction of the day given invocation of the script only lasts ~3 minutes.
For AWS ECS cluster, it is recommended to have atleast 1 EC2 server running 24x7. Have you looked at AWS Fargate whether it can run your docker container?. Also AWS Batch?. If Fargate and AWS Batch are not possible then for your requirement, I would recommend something like this without ECS.
Build an EC2 AMI with pre-built docker and required softwares and libraries.
Have AWS Instance Scheduler to spin up a EC2 server every hour and as part of user data, start a docker container with image you mentioned.
https://aws.amazon.com/answers/infrastructure-management/instance-scheduler/
If you know your task execution time maybe 5min. After 8 or 10min then bring server down with scheduler.
Above approach will blindly start a EC2 and stop it without knowing whether your python work is done successfully. We can still improve above with Lambda and CloudFormation templates combination. Let me know your thoughts :)
Actually it's possible to schedule the launch directly in CloudWatch defining a rule, as explained in
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/scheduled_tasks.html
This solution is cleaner, because you will not need to worry about the execution time: once finished, the Task will just terminate and a new one will be spawned on the next cycle

Amazon web service Data pipeline

Can we use existing ec2 instance details while configuring data pipeline? If it is possible then what are the ec2 details that we need to provide while creating a pipe line?
Yes, it is possible. According to AWS support.
"You can install Task Runner on computational resources that you manage, such as an Amazon EC2 instance, or a physical server or workstation. Task Runner can be installed anywhere, on any compatible hardware or operating system, provided that it can communicate with the AWS Data Pipeline web service.
This approach can be useful when, for example, you want to use AWS Data Pipeline to process data that is stored inside your organization’s firewall. By installing Task Runner on a server in the local network, you can access the local database securely and then poll AWS Data Pipeline for the next task to run. When AWS Data Pipeline ends processing or deletes the pipeline, the Task Runner instance remains running on your computational resource until you manually shut it down. The Task Runner logs persist after pipeline execution is complete."
I did this myself as it takes a while to get the pipeline to start up, this start up time could be 10-15 minutes depending on unknown factors.
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-how-task-runner-user-managed.html