I have an AWS-hosted website, which takes images for processing, and adds them to the SQS. Is it possible to automatically start the processing instance whenever there is something in the queue using AWS services, or should I do it manually in my backend's code?
Yes you can use EC2 and SQS in conjunction. Please go through this blog https://aws.amazon.com/articles/1464
Related
I have a local program that inputs a video, uses a tensorflow model to do object classification, and then does a bunch of processing on the objects. I want to get this running in AWS, but there is a dizzying array of AWS services. My desired flow is:
video gets uploaded to s3 --> do classification and processing on each frame of said video --> store results in s3.
I've used Lambda for similar work, but this program relies on 2 different models and its overall size is ~800 MB.
My original thought is to run an ec2 instance that can be triggered when 3 receives a video. Is this the right approach?
You can consider creating a docker image containing your code, dependencies, and the model. Then, you can push it to ECR and create a task definition and fargate cluster. When the task definition is ready, you can set up a cloudwatch event, which will be triggered upon s3 upload, and as a target, you can select fargate resources that were created at the beginning.
There's a tutorial with a similar case available here: https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/CloudWatch-Events-tutorial-ECS.html
I think you're on the right track. I would configure S3 to send new object notifications to an SQS queue. Then you can have your EC2 instance poll the queue for pending tasks. I would probably go with ECS + Fargate for this, but EC2 also works.
You can use Amazon Elemental to split the video file, and distribute the parts to different lambdas, so you can scale it, and process it in parallel.
In my architecture when I receive a new file on S3 bucket, a lambda function triggers an ECS task.
The problem occurs when I receive multiple files at the same time: the lambda will trigger multiple instance of the same ECS task that acts on the same shared resources.
I want to ensure only 1 instance is running for specific ECS Task, how can I do?
Is there a specific setting that can ensure it?
I tried to query ECS Cluster before run a new instance of the ECS task, but (using AWS Python SDK) I didn't receive any information when the task is in PROVISIONING status, the sdk only return data when the task is in PENDING or RUNNING.
Thank you
I don't think you can control that because your S3 event will trigger new tasks. It will be more difficult to check if the task is already running and you might miss execution if you receive a lot of files.
You should think different to achieve what you want. If you want only one task processing that forget about triggering the ECS task from the S3 event. It might work better if you implement queues. Your S3 event should add the information (via Lambda, maybe?) to an SQS queue.
From there you can have an ECS service doing a SQS long polling and processing one message at a time.
I'm new to Cloudwatch events and to Fargate. I want to trigger a Fargate task (Python) to run whenever a file is uploaded to a specific S3 bucket. I can get the task to run whenever I upload a file, and can see the name in the event log; however I can't figure out a simple way to read the event data in Fargate. I've been researching this the past couple of days and haven't found solution other than reading the event log or using a lambda to invoke the task and to put the event data in a message queue.
Is there a simple way to obtain the event data in Fargate with boto3? It's likely that I'm not looking in the right places or asking the right question.
Thanks
One of the easiest options that you can configure is two targets for same s3 image upload event.
Push the Same Event to SQS
launch Fargate task at the same time
Read Message Event from SQS when Fargate is up (No Lambda in between), also same task definition that will work a normal use case, make sure you exit the process after reading the message from sqs.
So in this case whenever Fargate Task up, it will read messages from the SQS.
To do this you would need to use a input transformer.
Each time a event rule is triggered a JSON object accessible to use for in the transformation.
As the event itself is not accessible within the container (like with Lambda functions), the idea is that you would actually forward key information as environment variables and manipulate in your container.
At this time it does not look like every service supports this in the console so you have the following options:
CloudFormation
Terraform
CLI
You can view a tutorial for this exact scenario from this link.
Is it possible to integrate AWS Lambda with Apache Kafka ?
I want to put a consumer in a lambda function. When a consumer receive a message the lambda function execute.
Continuing the point by Arafat. We have successfully built an infrastructure to consume from Kafka using AWS Lambdas. Here are some gotcha's:
Make sure to consistently batch and commit while reading when consuming.
If you are storing the batches to s3, make sure to clean your file descriptors.
If you are forwarding the batches to another service make sure to clean the variables. Variable caching in AWS Lambda might result in memory overflows.
A good idea is to check how much time you have left while from the context object in the Lambda and give yourself some wiggle room to do something with the buffer you populated in your consumer which might not be read to a file unless you call close().
We are using Apache Airflow for scheduling. I hear cloudwatch can do that too.
Here is AWS article on scheduled lambdas.
Given your Kafka installation will be running in a VPC, best practise is to configure your Lambda to run within the VPC as well - this will simplify the security group configuration for the EC2 instances running Kafka.
Here is the AWS blog article on configuring Lambdas to run in a VPC.
Yes it is very much possible to have a Kafka consumer in AWS Lambda function.
However note that you would not be able to invoke the lambda using some sort of notification. You will rather have to poll the Kafka topic. And the easiest way can be to use a Scheduled Lambda
If you are using managed apache kafka in AWS (MSK):
Since august 2020 you can connect AWS Managed Streaming for Kafka (MSK) as event source. Not your own installed kafka cluster but if you already uses AWS managed kafka this could be useful.
More in the announcement https://aws.amazon.com/about-aws/whats-new/2020/08/aws-lambda-now-supports-amazon-managed-streaming-for-apache-kafka-as-an-event-source/
Screenshot from AWS Console:
AWS now supports "self-hosted Apache Kafka as an event source for AWS Lambda"
When you create a new Lambda, in the "Configuration" tab, click "Add trigger", you can now select and configure your self-hosted Apache Kafka.
Feel free to read more here:
https://aws.amazon.com/blogs/compute/using-self-hosted-apache-kafka-as-an-event-source-for-aws-lambda/
https://docs.aws.amazon.com/lambda/latest/dg/kafka-smaa.html
There is a community-provided Kafka Connector for AWS Lambda. This solution would require you to run the connector somewhere such as EC2 or ECS.
I am working with PHP technology.
I have my program that will write message to Amazon SQS.
Can anybody tell me how I can use lambda service to get data from SQS and push it into MySQL. Lambda service should get trigger whenever new record gets added to the queue.
Can somebody share the steps or code that will help me to get through with this task?
There isn't any official way to link SQS and Lambda at the moment. Have you looked into using an SNS topic instead of an SQS queue?
Agree with Mark B.
Ways to get events over to lambda.
use SNS http://docs.aws.amazon.com/sns/latest/dg/sns-lambda.html
use SNS->SQS and have the lambda launched by the sns notification just use it to load whatever is in te SQS queue.
use kinesis.
alternatively have lambda run by cron job to read sqs. Depends on needed latency. If you require it be processed immediately then this is not the solution because you would be running the lambda all the time.
Important note for using SQS. You are charged when you query even if no messages are waiting. So do not do fast polls even in your lambdas. Easy to run up a huge bill doing nothing. Also good reason to make sure you set up cloudwatch on the account to monitor usage and charges.