I am developing a Spring batch application and want to deploy on AWS env to ensure minimal usage of resources and as soon as the batch completes the resources should be terminated.
Please suggest me the best service for that in AWS. My job will take 1-2 hr to run.
Edit: AWS batch seems right option but not sure how inter node communication will happen as Spring batch using messaging middleware for inter node communication but AWS batch suggest to use IP based MPI, Apache MXnet etc for the same purpose
Related
I have below 2 requirements can you please help share any suggestion
AWS Batch trigger a ECR Fargate ( On demand )
AWS Batch trigger a Spring App deployed in ECR ( running permanently)
SO here Option 1, I need to start a spring boot app which should start in ECR Fargate. this I understood from AWS batch we can specify the Cluster of the Fargate so that when the AWS batch run the app will get started.
For Option 2, I have a spring boot App deployed in ECR Fargate and it will be running, and inside spring batch is there, Now AWS batch need to trigger the spring batch. it is possible if so can you please share the implementation sample.
Also from my client App or program I need to update the AWS batch, saying the job is success or failure. can you share any sample for those as well.
AWS Batch only executes ECS tasks not ECS services. For option 1 - to launch a container (your app that does the work you want) within ECS Fargate, you would need to specify an AWS Batch compute environment as Fargate, a job queue that references the compute environment, and a job definition of the task (what container to run, what command to send, what CPU and memory resources are required). See the Learn AWS Batch workshop or the AWS Batch Getting Started documentation for more information on this.
For option 2 - AWS Batch and Spring Batch are orthogonal solutions. You should just call the Spring Batch API endpoint directly OR rely on AWS Batch. Using both is not recommended unless this is something you don't have control over.
But to answer your question - calling an non-AWS API endpoint is handled in your container and application code. AWS Batch does not prevent this but you would need to make sure that the container has secure access to the proper credentials to call the Spring boot app. Once your Batch job calls the API you have two choices:
Immediately exit and track the status of the spring batch operations elsewhere (i.e. The Batch job is to call the API and SUCCESS = "able to send the API request successfully, FAIL = "not able to call the API" )
Call the API, then enter a loop where you poll the status of the Spring batch job until it completes successfully or not, exiting the AWS Batch job with the same state as the Spring batch job did.
I am new to AWS. I have developed a batch processing application using Spring boot, batch and Quartz for scheduling. It pulls a file from remote FTP server and loads into DB. Can someone pls help, how can I deploy this to AWS?
Will this be on Elastic Beanstalk (EC2) instance with volume mounted to it for downloading the file from FTP and then processing?
Thanks,
You should probably look at the official Spring site, Spring Cloud AWS. It is a somewhat involved process, but powerful. If your job is a small one, you may wish to consider a AWS Lambda function instead.
Can we use existing ec2 instance details while configuring data pipeline? If it is possible then what are the ec2 details that we need to provide while creating a pipe line?
Yes, it is possible. According to AWS support.
"You can install Task Runner on computational resources that you manage, such as an Amazon EC2 instance, or a physical server or workstation. Task Runner can be installed anywhere, on any compatible hardware or operating system, provided that it can communicate with the AWS Data Pipeline web service.
This approach can be useful when, for example, you want to use AWS Data Pipeline to process data that is stored inside your organization’s firewall. By installing Task Runner on a server in the local network, you can access the local database securely and then poll AWS Data Pipeline for the next task to run. When AWS Data Pipeline ends processing or deletes the pipeline, the Task Runner instance remains running on your computational resource until you manually shut it down. The Task Runner logs persist after pipeline execution is complete."
I did this myself as it takes a while to get the pipeline to start up, this start up time could be 10-15 minutes depending on unknown factors.
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-how-task-runner-user-managed.html
I'm migrating off java web app from GAE to AWS, I have a dozen cron jobs each has a different schedule (one of them has a different timezone), it was quite an easy task # gae, just appending the servlet & time to execute to the cron.xml file.
I know how to set cron job on linux but I'm guessing aws OpsWorks is better suited for this task, how do I attach a java program to OpsWorks ?
I haven't seen any way to run a command (I used Jenkins)
Opsworks uses chef for configuration, so you'd need to write a custom chef recipe to manage your crons, there are public cookbooks available where most of it is already automated:
https://github.com/opscode-cookbooks/cron
Then you can run your recipe via opsworks.
I am trying to understand how to deploy an Amazon Kinesis Client application that was built using the Kinesis client library (KCL).
I found this but it only states
You can follow your own best practices for deploying code to an Amazon EC2 instance when you deploy a Amazon Kinesis application. For example, you can add your Amazon Kinesis application to one of your Amazon EC2 AMIs.
which is not giving a broader picture to me.
These examples use an Ant script to run Java program. Is this the best practice to follow?
Also, I understand even before running the EC2 instances I need to make sure
The developed code JAR/WAR or any other format needs to be on the EC2 instance
The EC2 instance needs to have all the required environment like Ant setup in place already to execute the program.
Could someone please add some more detail on this?
Amazon Kinesis will be responsible for ingesting data, not running your application. You can run your application anywhere, but it is a good idea to run it in EC2, as you are probably going to use other AWS Services, such as S3 or DynamoDB (Kinesis Client Library uses DynamoDB for sharding, for example).
To understand Kinesis better, I'd recommend that you launch the Kinesis Data Visualization Sample. When you launch this app, use the provided CloudFormation template. It will create a stack with the Kinesis stream and an EC2 instance with the application, that uses Kinesis Client Library and is a fully working example to start from.
The best way I have found to host a consumer program is using EMR, but not as a hadoop cluster. Package your program as a jar, and place it in s3. Launch an emr cluster and have it run your jar. Using the data pipeline you can schedule this job flow to run at regular intervals. You can also scale an emr cluster, or use a actual EMR job to process the stream if you choose to get the high tech.
You can also use Beanstalk. I believe this article is highly useful.