AWS question - How can I get Cloudwatch event data in a Fargate task with Python - amazon-web-services

I'm new to Cloudwatch events and to Fargate. I want to trigger a Fargate task (Python) to run whenever a file is uploaded to a specific S3 bucket. I can get the task to run whenever I upload a file, and can see the name in the event log; however I can't figure out a simple way to read the event data in Fargate. I've been researching this the past couple of days and haven't found solution other than reading the event log or using a lambda to invoke the task and to put the event data in a message queue.
Is there a simple way to obtain the event data in Fargate with boto3? It's likely that I'm not looking in the right places or asking the right question.
Thanks

One of the easiest options that you can configure is two targets for same s3 image upload event.
Push the Same Event to SQS
launch Fargate task at the same time
Read Message Event from SQS when Fargate is up (No Lambda in between), also same task definition that will work a normal use case, make sure you exit the process after reading the message from sqs.
So in this case whenever Fargate Task up, it will read messages from the SQS.

To do this you would need to use a input transformer.
Each time a event rule is triggered a JSON object accessible to use for in the transformation.
As the event itself is not accessible within the container (like with Lambda functions), the idea is that you would actually forward key information as environment variables and manipulate in your container.
At this time it does not look like every service supports this in the console so you have the following options:
CloudFormation
Terraform
CLI
You can view a tutorial for this exact scenario from this link.

Related

Can an aws target be triggered by multiple cloudwatch event rules

I'm trying to set up a scheduler system for our infrastructure that suppose to take care of all scheduled housekeeping tasks. Our proposal is to make it simple and scalable with one docker image. A Script of each task and cloudWatch event rule will be passed in as parameters. The scripts will be uploaded on an s3 bucket and will be downloaded when the job gets triggered. This way we can avoid redeploying every time a task gets added.
The only tricky park is to pass in cloudWatch event rule as parameter.
Can an event target be triggered by multiple rules? Am I too ambitious on this project? I use terraform to provision it.
Turn Cloud watch logs on
Create a metric filter
Assign a metric
Create alarm.
Here is a tutorial which you can modify to suit your needs.
https://aws.amazon.com/blogs/security/how-to-receive-notifications-when-your-aws-accounts-root-access-keys-are-used/

How do I run AWS Dead Letter Queue periodically?

Requirement
I'd like to push DLQ messages to Lambda function periodically like cron.
Situation
I created a batch processing system using SQS and Lambda.
So, I also prepare DLQ(Dead Letter Queue) to store the failed messages.
AWS proposed re-run failed messages in the DLQ with Lambda trigger.
https://docs.aws.amazon.com/lambda/latest/dg/invocation-retries.html
I'd like to re-run periodically and automatically like cron, but AWS doesn't show how to do that except for manually.
In case I re-run manually, I JUST map the lambda and the DLQ. It's all done.
After that, I can re-run messaging dynamically with Enable and Disable button.
Otherwise, it's more complicated because no API switching Enable and Disable of Lambda trigger.
boto3 API: create_event_source_mapping() and delete_event_source_mapping() are seemed to better way.
But delete_event_source_mapping() requires UUID of the event mapping. I think it indicates that I need any datastore like ElastiCache or else.
However, I don't wanna prepare other resources in this situation if possible.
My Solution
Cloud Watch Event call lambda.
lambda activate(Enable) event source mapping using create_event_source_mapping().
lambda deactivate(Disable) event source mapping using delete_event_source_mapping()
It looks good to me at first, but in 3rd process lambda want to know UUID from the 1st event. I think this case needs datastore including UUID.
Do you have any solutions without datastore?
Thanks.

AWS ECS Task single instance

In my architecture when I receive a new file on S3 bucket, a lambda function triggers an ECS task.
The problem occurs when I receive multiple files at the same time: the lambda will trigger multiple instance of the same ECS task that acts on the same shared resources.
I want to ensure only 1 instance is running for specific ECS Task, how can I do?
Is there a specific setting that can ensure it?
I tried to query ECS Cluster before run a new instance of the ECS task, but (using AWS Python SDK) I didn't receive any information when the task is in PROVISIONING status, the sdk only return data when the task is in PENDING or RUNNING.
Thank you
I don't think you can control that because your S3 event will trigger new tasks. It will be more difficult to check if the task is already running and you might miss execution if you receive a lot of files.
You should think different to achieve what you want. If you want only one task processing that forget about triggering the ECS task from the S3 event. It might work better if you implement queues. Your S3 event should add the information (via Lambda, maybe?) to an SQS queue.
From there you can have an ECS service doing a SQS long polling and processing one message at a time.

How to extract event relayed from AWS EventBridge to ECS Fargate

I articulate the question as follows:
Is the EventBridge event relayed to the ECS Task? (I can't see how much useful it could be if the event is not relayed).
If the event is relayed, then how to able to extract it from within say a Node app running as Task.
Some Context is Due: It is possible to set an EventBridge rule to trigger ECS Fargate Tasks as the result of events sourced from, say, CodeCommit. Mind you, the issue here is the sink/target, not the source. I was able to trigger a Fargate Task as I updated my repo. I could have used other events. My challenge resides in extracting the event relayed (in this case, repository name, commitId, etc from Fargate.)
The EventBridge documentation is clear on how to set the rules to trigger events but is mum on how events can be extracted - which makes sense as the sink/target documentation would have the necessary reference. But ECS documentation is not clear on how to extract relayed events.
I was able to inspect the metadata and process.env. I could not find the event in either of the stores.
I have added a CloudWatch Log Group as a target for the same rule and was able to extract the event. So it certainly relayed to some of the targets, but not sure if events are relayed to ECS Task.
Therefore, the questions arise: is the event relayed to the ECS Task? If so, how would you access it?

Which one is a better scheduler in AWS Data Pipeline and AWS SWF

I have a situation where I have to trigger my workflow based on this condition "It has to process all files in s3 and then start again when there are files in s3". However, I found that Data Pipeline starts every scheduled duration while SWF starts and ends the job which also shut downs my EMR Cluster. Both of them are not suitable in this case. So, for a process which has to start or trigger based on a condition neither is suitable is what I found. Is there any alternative? Or is one of SWF and Data Pipeline could perform my task.
This is more like #Chris's answer's corollary. You still make use of Lambda - listen to S3 - Put Event trigger - so every time when there is a new object being create - the lamdba function would be called.
The Lambda Function can pick up the S3 object's key and put it in SQS; you can run a separate Worker Process which can pick items from the Queue.
To reiterate your statement,
It has to process all files in s3 [ Can be Done by Lambda ]
and then start again when there are files in s3 [Can be Done by SQS & EC2 ]
Look at Lambda. You can set up a trigger so that your code is invoked each time a new object is uploaded to S3.
Data Pipeline supports the concept of Preconconditions which can trigger your execution based on conditions. The S3KeyExists preconditions seems like what you're looking for. This will begin the execution of your activity when a particular S3 key exists.
Data Pipeline will also manage the creation and termination of your resource (EC2 or EMR) based on the activity's execution. If you wish to use your own EC2 instance or EMR cluster you can look into worker groups. Worker group resources are managed by you and will not be terminated by the service.