AWS run etl scheduled script on EC2 - amazon-web-services

I have some etl python scripts on an EC2 instance which I run in order to fetch, process and load some data in a Postgresql DB that is hosted on the same EC2 instance.
The request was to host the DB on the EC2 instance instead of using RDS in order to save money.
The scripts take 1 hour to complete (because of some heavy processing involved which consumes 90% of the time).
I would like to build a solution to automate this task.
What I would like to do is to:
Turn on the EC2 instance at a given time;
Run the scripts inside the EC2 instance;
Turn off the EC2 instance after the scripts finish their job (which means after 1 hour);
I've seen some solutions with AWS System Manager and AWS Lambda, but they all seem outdated since the AWS interface changed.

Related

Migrating on-premises Python ETL scripts that feed a Splunk Forwarder from a syslog box to AWS?

I've been asked to migrate on-premises Python ETL scripts that live on a syslog box over to AWS. These scripts run as cron-jobs and output logs that a Splunk Forwarder parses and sends to our Splunk instance for indexing.
My initial idea was to deploy a Cloudwatch-triggered Lambda function that spins up an EC2 instance, runs the ETL scripts cloned to that instance (30 minutes), and then brings down the instance. Another idea was to containerize the scripts and run them as task definitions. They take approximately 30 minutes to run.
Any help moving forward would be nice; I would like to deploy this in IaaC, preferably in troposphere/boto3.
Another idea was to containerize the scripts and run them as task definitions
This is probably the best approach. You can include the splunk universal forwarder container in your task definition (ensuring both containers are configured to mount the same storage where the logs are held) to get the logs into splunk. You can schedule task execution just like lambda functions or similar. Alternatively to the forwarder container, if you can configure the logs to output to stdout/stderr instead of log files, you can just setup your docker log driver to output directly to splunk.
Assuming you don't already have a cluster with capacity to run the task, you can use a capacity provider for the ASG attached to the ECS cluster to automatically provision instances into the cluster whenever the task needs to run (and scale down after the task completes).
Or use Fargate tasks with EFS storage and you don't have to worry about cluster provisioning at all.

Run a batch file on EC2 from a (python) lambda

I can see a generic way of starting an EC2 from lambda in Start and Stop Instances at Scheduled Intervals Using Lambda and CloudWatch.
Suppose I use that method to start an EC2, and suppose the AMI is a windows server 2019 customised to have a .bat file on the desktop, and also suppose I'm using a python lambda.
How can I execute this batch file from the lambda? (i.e. just as though someone had RDP'd into the instance and double-clicked on it)
Note: To be very clear, basically I want to start the EC2 using the method given in the AWS docs (above), and right after the instance has started, to run the batch file that will be sitting on the instance's desktop
I think you have a few concepts mixed together.
AWS Lambda functions run on the Lambda service, without having to use Amazon EC2 instances. This is what makes them "serverless".
If you have a batch file on an Amazon EC2 instance, you would presumably want to run that batch file on the EC2 instance itself, without involving Lambda (since you have got a server).
If you wish to run a script on an EC2 instance when it launches for the first time, you can provide a PowerShell or Command-Line script via the User Data field. Software on the AMI will automatically execute this script the first time that the instance starts.
This script could do all the work itself, or it could simply call another script that is stored on the disk. Some people use the script to download another script from a repository (eg Amazon S3 or GitHub) and then execute the downloaded script.
For more information, see: Running Commands on Your Windows Instance at Launch - Amazon Elastic Compute Cloud
If the Amazon EC2 instance is already running and you wish to trigger a script to execute, you can use the AWS Systems Manager Run Command. This works by having an agent on the instance which can be remotely triggered, thereby running scripts without having to login to the instance.

have jenkins start an EC2 instance and Terminating it

I have a full deployment job that takes an ip of a running instance and deploys my system on it.
I currently hold an EC2 instance for automation tests that run every night, but the instance is expensive and im looking for a way to initiate it before the tests and terminate it after the test.
I looked for EC2 plugins that can help and the closest one was this but this is for making slaves and thats not what I want.
I want to be able to launch an EC2 instance, and pass its IP address to the automation tests job, then terminate that instance once done.
I started making a command line bash file for this, but this seems like too much work, and I thought maybe there is something im missing.
Your requirement is valid and amazon knows:
When you stop an instance, we shut it down. We don't charge usage for a stopped instance, or data transfer fees, but we do charge for the storage for any Amazon EBS volumes.
Reference :
- https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Stop_Start.html
Here some approaches to start/stop your instaces
Amazon EC2 HTTP API
This is an api rest and you can perform a simple http request to start or stop your instance:
Amazon EC2 API Reference
Start Instance endpoint
https://ec2.amazonaws.com/?Action=StartInstances&...
Stop Instance endpoint
https://ec2.amazonaws.com/?Action=StopInstances&...
You can invoke this api from Jenkins in many ways : simple shell execution,groovy and scripted/declarative, pipelines.
AWS CLI
start instance
stop instance
Here more about how suspend instances using aws cli:
https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-suspend-resume-processes.html
Also with powershell: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Stop_Start.html
You can invoke this api from Jenkins in many ways : simple shell execution,groovy and scripted/declarative, pipelines.
AWS Instance Scheduler
In 2018, AWS launched the AWS Instance Scheduler, a new and improved scheduling solution that enables customers to schedule Amazon EC2 instances.
With this tool you can automatically start and stop the Amazon EC2 and Amazon RDS instances.
Reference :
https://aws.amazon.com/answers/infrastructure-management/instance-scheduler/
With this approach you don't need Jenkins :b

Scheduling the stopping/starting of an EC2 instance when not in use by a Beanstalk Deployment or an ECS task?

I have a Docker image containing Python code and third-party binary executables. There are only outbound network requests. The image must run hourly and each execution lasts ~3 minutes.
I can:
Use an EC2 instance and schedule hourly execution via cron
Create a CloudWatch Event/Rule to run an ECS Task Defintion hourly
Setup an Elastic Beanstalk environment and schedule hourly deployment of the image
In all of these scenarios, an EC2 instance is running 24/7 and I am being charged for extended periods of no usage.
How do I accomplish scheduling the starting of an existing EC2 instance hourly and the stopping of said instance after the completion of my docker image?
Here's one approach I can think of. It's very high-level, and omits some details, but conceptually it would work just fine. You'll also need to consider the Identity & Access Management (IAM) Roles used:
CloudWatch Event Rule to trigger the Step Function
AWS Step Function to trigger the Lambda function
AWS Lambda function to start up EC2 instances
EC2 instance polling the Step Functions service for Activity Tasks
Create a CloudWatch Event Rule to schedule a periodic task, using a cron expression
The Target of the CloudWatch Event Rule is an AWS Step Function
The AWS Step Function State Machine starts by triggering an AWS Lambda function, which starts the EC2 instance
The next step in the Step Functions State Machine invokes an Activity Task, representing the Docker container that needs to execute
The EC2 instance has a script running on it, which polls the Activity Task for work
The EC2 instance executes the Docker container, waits for it to finish, and sends a completion message to the Step Functions Activity Task
The script running on the EC2 instance shuts itself down
The AWS Step Function ends
Keep in mind that a potentially better option would be to spin up a new EC2 instance every hour, instead of simply starting and stopping the same instance. Although you might get better startup performance by starting an existing instance vs. launching a new instance, you'll also have to spend time to maintain the EC2 instance like a pet: fix issues if they crop up, or patch the operating system periodically. In today's world, it's a commonly accepted practice that infrastructure should be disposable. After all, you've already packaged up your application into a Docker container, so you most likely don't have overly specific expectations around which host that container is actually being executed on.
Another option would be to use AWS Fargate, which is designed to run Docker containers, without worrying about spinning up and managing container infrastructure.
AWS Step Functions
AWS Fargate
Blog: AWS Fargate: An Overview
Creating a CloudWatch Event Rule that triggers on a schedule

automate exe installation in AWS ec2 instances

Is there any way to install exe/MSI agents in AWS EC2 instances in an automated way?? In specific, I am looking for a counterpart of Azure's Custom Script Extension. [Free of cost]
Scenario:
I want to install BigFix and Datadog agents on 1000 Ec2 instances, this is a one time job, so I am not looking for any solution that involves Chef / Puppet, etc.,
Yes, you can pass a script to the instance that will be executed on the first boot (but not thereafter). It is often referred to as a User Data script.
See:
Running Commands on Your Windows Instance at Launch - Amazon Elastic Compute Cloud
Running Commands on Your Linux Instance at Launch - Amazon Elastic Compute Cloud
If you wish to install after the instance has started, use the AWS Systems Manager Run Command.