I have a daily process that needs to digest a tremendous amount of data from two external sources. It normally requires around 28GB or RAM, and a decent amount of processing power. Due to this, an AWS Lambda won't work.
In the meantime, I've been running the process on an EC2 instance. In order to save resources, I've attempted to start the instance using a CloudWatch event. Since no event exists for "StartEC2," I'm kicking off a AWS Lambda instead, which in turn starts the EC2 isntance using Amazon support libraries.
All of this is extremely cumbersome, and I've been looking for a library or pattern that can do what I want. Essentially, I need to start an EC2 instance on a cron/event, deliver a unit of work to it (Shell Script, Java App, whatever), have it run it, then shutdown.
I'd love any suggestions for accomplishing this.
Look into AWS Systems Manager (SSM), you can create an Automation document that will launch the instance, run any custom scripts or tasks, and shut it down again when you're done. You can trigger the SSM Automation with a cron schedule via CloudWatch Events.
You may also want to consider AWS Batch for this type of workload.
Related
I have scheduled 2 cronjobs for my application.
My Application server is in an autoscaling group and I kept a minimum of 2 instances because of High availability. Everything working is fine but cron job is running multiple times because of 2 instances in autoscaling.
I could not limit the instance size to 1 because already my application in the production environment I prefer to have HA.
How should I have to limit execute cron job on a single instance? or should i have to use other services like AWS Lamda or AWS ELasticBeanstalk
Firstly you should consider whether running the crons on these instances is suitable. If you're trying to keep this highly available and it is directly interacted via customers what will the impact of the crons performance be?
Perhaps consider using a separate autoscaling group or instance with a total of 1 instances to run these crons? You could launch the instance or update the autoscaling group just before the cron needs to run and then automate the shutdown after it has completed.
Otherwise you would need to consider using a locking mechanism for your script. By using this your script write a lock to confirm that it is in process, at the beginning of the script run it would check whether there was any script lock in progress. To further prevent the chance of a collision between multiple servers consider adding jitter (random seconds of sleep) to the start of your script.
Suitable technologies for writing a lock are below:
DynamoDB using strongly consistent reads.
EFS for a Linux application, or FSX for a Windows application.
S3 using strong consistency.
Solutions suggested by Chris Williams sound reasonable if using lambda function is not an option.
One way to simulate cron job is by using CloudWatch Events (now known as EventBridge) in conjunction with AWS Lambda.
First you need to write a Lambda function with the code that needs to be executed on a schedule. Lambda supports cron expressions.
You can then use Schedule Expressions with EventBridge/CloudWatch Event in the same way as a cron tab and mention the Lambda function as target.
you can enable termination protection on of the instance. Attach necessary role & permission for system manager. once the instance is available under managed instance under system manager you can create a schedule event in cloudwatch to run ssm documents. if you are running a bash script convert that to ssm document and set this doc as targate. or you can use shellscript document for running commands
We have an EC2 server that runs cronjobs. Currently there is a crontab on that server that holds the cronjob settings. Everything runs perfectly fine on this server.
Would it be overkill to use AWS Cloudwatch Events to trigger the crons instead? ie create a cloudwatch event that calls a lambda to run a shell command on the EC2 instance.
My thinking is that these would be possible benefits:
no need to manage a crontab file on the EC2 server
easier to activate/deactivate specific cronjobs
looks like there are indeed benefits according to the AWS Docs:
https://aws.amazon.com/blogs/compute/scheduling-ssh-jobs-using-aws-lambda/
Decouple job schedule and AMI: If your cron jobs are part of an AMI, each schedule change requires you to create a new AMI version, and update existing instances running with that AMI. This is both cumbersome and time-consuming. Using scheduled Lambda functions, you can keep the job schedule outside of your AMI and change the schedule on the fly.
Flexible targeting of EC2 instances: By abstracting the job schedule from AMI and EC2 instances, you can flexibly target a subset of your EC2 instance fleet based on tags or other conditions. In this example, we are targeting EC2 instances with the “Environment=Dev” tag.
Intelligent scheduling: With scheduled Lambda functions, you can add custom logic to you abstracted job scheduler.
In my experience it's not an over kill at all. I have used same setup with great success running job(s) (around 50 different jobs) with heavy workload.
My setup was slightly different
The cloudwatch scheduled event was calling a lambda which in turn was putting a messages on a sqs and application in running on ec2 instance(s) was grabbing messages from the sqs and processing them.
The sqs was simply added for robustness.
But this may or may not make sense in your use case.
I have an AWS workflow which is as follows:
API call -> Lambda Function (Paramiko Remote Connect) -> EC2 -> output
Basically, I have an API call, which triggers a lambda function. Within the lambda function, I remote connect to a preconfigured EC2 instance using Python Paramiko, run some commands on the ec2 instance, and then return the output. I have two main concerns with this design: 1.) latency and 2.) scalability.
For Latency:
When I call the API, it takes 8-9 seconds to run, but if I were to run the job directly on the EC2 instance, it would take 1-2 seconds. Do the ssh_client.connect() and ssh_client.exec_command() cause significantly increased runtime? Also, I am implementing this on a t2-micro ubuntu 18.04 free-tier EC2 instance. Would using the paid versions cause a difference in runtime?
For Scalability:
I am sure AWS has a solution for this, but suppose that there are several simultaneous API calls. I am sure that I can't have only 1 available EC2 instance to run the job. Should I have multiple EC2 instances preconfigured and use a load-balancer? What AWS features can I use to scale this system?
If anything is unclear, please ask and I will elaborate.
Rather than using Paramiko, the more "cloud-friendly" method of running commands on an EC2 instance would be to use AWS Systems Manager Run Command, which uses an agent to run commands on instance. It can even run commands on multiple instances and also on-premises computers that have the agent installed.
Another design choice is to push a "job" message to an Amazon SQS queue. The worker instances can poll the SQS queue asking for work. When they receive a message, they can perform the work. This is more of an asynchronous model because the main system does not 'wait' for job to finish, so it needs a return path to provide the results (eg another SQS queue). However, it is highly scalable and more resilient, with no load balancer required. This is a common design pattern.
I have an AWS CLI invocation (in this case, to launch a configured EMR cluster to do some steps and then shut down) but I'm not sure how to go about running it daily.
I guess one way to do it is an EC2 micro instance running a cron job, or an ECS task in a micro that launches the command, but that all seems like it might be overkill. It looks like there's also a way to do it in Lambda, but rom what I can tell it'd be kludgy.
This doesn't have to be a good long-term solution, something that's suitable until I can do it right (Data Pipelines) would work just fine.
Suggestions?
If it is not a strict requirement to use the AWS CLI, you can use one of the AWS SDK instead to programmatically invoke Lambda.
Schedule a CloudWatch Rules using cron
When configured, the CloudWatch Rules will trigger a Lambda function
Implement a Lambda function that calls EMR using one of the supported SDKs (e.g. the EMR class in the AWS JavaScript SDK)
Make sure that you have the IAM configuration in place
Full example is available in the Schedule AWS Lambda Functions Using CloudWatch Events
Kludgy? Yes, configuration is needed, however if you take into account the amount of work required to launch EC2 / ECS (and make sure that it re-launches in the event of failure), I'd say it evens out.
Not sure about the whole task that you are doing, but to avoid doing it:
Manually
Avoid another set up for resources in AWS (as you mentioned)
I would create a simple job in a Continuous Integration (CI) server like jenkins,bamboo,circleci ..... (list can go on). I would assume that you might already have a CI server running, why not use it?
I have few EC2 servers in AWS. Whenever the disk space exceeds a limit, i want to delete some files (may be logs folder) in EC2 instance automatically. I am planning to use Lambda and cloudwatch for this. Can i use Lambda to interact with EC2. If not possible, what is the alternate approach to achieve this functionality.
This is not an appropriate use-case for an AWS Lambda function.
AWS Lambda is suitable for tasks where compute is required in response to an event. Your use-case, however, is to manipulate information on an EC2 instance, which does not need cloud compute.
You could run a script on each each computer, triggered by a Scheduled Task.
Alternatively, you could use the Systems Manager Run Command (also known as the EC2 Run Command), which allows you to run commands on multiple Amazon EC2 instances and view the results. This could be used to trigger a local script, or it could pass the whole command to run (including the script). It is purpose-built for the type of task you describe.
AWS Lambda has access to your instances if they are available in the internet. If they are not available in the internet, it is possible to give access to AWS lambda using a NAT or instance Gateway in your VPC.
The problem is: access to your instance does not means access to the instances filesystems. To delete the files from Lambda you can use two alternatives:
Configure a network filesystem service in your instances an connect
to this services in your lambda function. Using windows you would
just "share" your disks, but in that case you would use some SMB
library in your lambda code, that "I think" did not have native SMB
support. Just keep in mind that your security guy will scream out
loud when you propose this alternative.
Create a "agent" in your EC2 instances and keep it running as a
Windows Service and call this agent from your lambda function. In
that case, the lambda will start the execution of the agent that
will be responsible for the file deletion.
Another option, is to follow Ramesh's suggestion and create a Powershell script and configure a cron job. To be easy, you can create a Image with this Powershell script and use the image to initialize each instance. The same solution would be applicable to "the agent" solution in the lambda alternantives.
I think that, in any case, you will need to change something in your 150 servers. Using a customized image can help you to simplify this a little bit, but you will not get a solution without some changes.
According to the following thread, you cannot access files inside a EC2 VM unless you are exposing files to the public using different methodology.
AWS Forum
Quoting from the forum
If you are talking about the underlying EC2 instance, answer is No, you cannot access those files.
However as a solution for your problem, you can used scheduled job to cleanup your files depending your usage. You can use a service or cron job.