For my web app I will need a separate instance of EC2 to process CPU-intensive things, and things that can be queued so they don't encumber the web serving instance, like image resizing, sending email....
When you create an AWS Elastic Beanstalk instance it asks you to choose between a "web" or "worker" environment. From my understanding it's in the worker environment that I will process those kind of tasks.
What's the role of SQS in this context? I read it is only about sending "messages" and this kind of stuff, but how will I get my image resized with a "message"?
Should I create specific, distinct code for the worker instance, to handle image resizing, and then use SQS to order it to process the image? Then can SQS pass image from web instance, to worker instance? I completely miss the main concept.
A queuing service (such as Amazon SQS) is used to store messages for later retrieval. Think of it like a TODO list -- you add items to the queue, and later you retrieve an item from the queue and take action on the item.
For example, let's say users upload images to a website and you wish to generate thumbnails from those images. Your website will store the image in Amazon S3, then push a message into an SQS queue. The message would include a reference to the image in S3 and details about the user.
Then, your Elastic Beanstalk worker will request a message from the queue and process the image. It would retrieve the image from S3, resize it, store it in an S3 bucket, then perhaps email the user to say that the job is finished. The worker then exits, and Elastic Beanstalk will trigger a new worker to read the next message from the queue and do it all again.
So, yes -- you will create the worker code. Elastic Beanstalk will trigger the worker with the SQS message. SQS itself doesn't trigger anything -- it is actually Elastic Beanstalk that retrieves the message and runs the worker.
See: Elastic Beanstalk worker environments
Related
I have a local program that inputs a video, uses a tensorflow model to do object classification, and then does a bunch of processing on the objects. I want to get this running in AWS, but there is a dizzying array of AWS services. My desired flow is:
video gets uploaded to s3 --> do classification and processing on each frame of said video --> store results in s3.
I've used Lambda for similar work, but this program relies on 2 different models and its overall size is ~800 MB.
My original thought is to run an ec2 instance that can be triggered when 3 receives a video. Is this the right approach?
You can consider creating a docker image containing your code, dependencies, and the model. Then, you can push it to ECR and create a task definition and fargate cluster. When the task definition is ready, you can set up a cloudwatch event, which will be triggered upon s3 upload, and as a target, you can select fargate resources that were created at the beginning.
There's a tutorial with a similar case available here: https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/CloudWatch-Events-tutorial-ECS.html
I think you're on the right track. I would configure S3 to send new object notifications to an SQS queue. Then you can have your EC2 instance poll the queue for pending tasks. I would probably go with ECS + Fargate for this, but EC2 also works.
You can use Amazon Elemental to split the video file, and distribute the parts to different lambdas, so you can scale it, and process it in parallel.
In my architecture when I receive a new file on S3 bucket, a lambda function triggers an ECS task.
The problem occurs when I receive multiple files at the same time: the lambda will trigger multiple instance of the same ECS task that acts on the same shared resources.
I want to ensure only 1 instance is running for specific ECS Task, how can I do?
Is there a specific setting that can ensure it?
I tried to query ECS Cluster before run a new instance of the ECS task, but (using AWS Python SDK) I didn't receive any information when the task is in PROVISIONING status, the sdk only return data when the task is in PENDING or RUNNING.
Thank you
I don't think you can control that because your S3 event will trigger new tasks. It will be more difficult to check if the task is already running and you might miss execution if you receive a lot of files.
You should think different to achieve what you want. If you want only one task processing that forget about triggering the ECS task from the S3 event. It might work better if you implement queues. Your S3 event should add the information (via Lambda, maybe?) to an SQS queue.
From there you can have an ECS service doing a SQS long polling and processing one message at a time.
I am getting familiar with queue services in Amazon.
SQS is pull based not push based, so I have to have an EC2 instance pulling out the messages from the queue.
Are those instances EC2 AMI VM? or when I created an sqs queue ... do I have to associate to a special EC2 instance?
Why we can lose an EC2 instance when they are reading queues?
Any computer on the Internet can make a ReceiveMessage() API call. This could be an Amazon EC2 instance, or an AWS Lambda function, or a container or even the computer under your desk.
The typical architecture is that some 'worker' code is running somewhere, and it polls the Amazon SQS queue to ask for a message. If a message is available, the worker then processes the message and then deletes the message.
So, simply include the code to 'pull' the message within the program that will process the message.
We plan to run a java application on Elastic BeanStalk. It is noting but a file retriever, processor, transformer and a mapper. It would retrieve the file from S3 and map it to an RDS DB. Question is how do I trigger this application running on beanstalk on file arrival on S3 bucket and also on demand ?
Thanks and regards,
Kunal
You can send an event to SNS (notification topic) on S3 file upload.
Then I see two options:
To the SNS you can hook up a lambda or http invocation, however in that case you will need to handle failures or availability issues.
If your application is running on EC2 I'd suggest to send the upload event to SQS (queue service) and your application can poll for queue messages
also on demand
for that you need to expose a an interfate or service from your application. You did not specify what your application is, so it's really up to you to define what 'on demand' is
I have an AWS-hosted website, which takes images for processing, and adds them to the SQS. Is it possible to automatically start the processing instance whenever there is something in the queue using AWS services, or should I do it manually in my backend's code?
Yes you can use EC2 and SQS in conjunction. Please go through this blog https://aws.amazon.com/articles/1464