Running Periodic tasks on elastic beanstalk workers with FIFO queue - amazon-web-services

I was trying to setup a periodic task (using cron.yaml) in EB worker environment which is using a FIFO SQS queue. When cron job tries submit job to SQS, it fails because it does not have message group id which is required for FIFO queue.
Is there a way around this? (Apart from using some other scheduling mechanism or using general queue)
scheduler: dropping leader, due to failed to send message for job
'italian-job', because: The request must contain the parameter
MessageGroupId. (Aws::SQS::Errors::MissingParameter)
Update: As a work around, I created a cloud watch trigger to start a lambda which sends messages to SQS queue.

Related

Triggering an Airflow DAG once message arrive at a AWS SQS queue

Is it possible to schedule a DAG run when message arrives at the SQS queue?? I also need the dag to process the message in the queue. From what I know this could be done by using the SQSSensor but I couldn't find any example and I am confused on how to move forward.
Airflow runs DAGs on a fixed interval, while you're now looking to trigger DAGs per event. You'll have to do this outside of Airflow, e.g. using a Lambda trigger listening on the queue, which triggers an Airflow DAG via the REST API.
The SQSSensor in Airflow won't allow for event-by-event processing because it simply polls the queue after a DAG run starts (checking for new messages, pushing them to an XCom with key "messages", and deleting the messages if found). So if your DAG run is scheduled to once a day, an SQSSensor would only start polling for new messages once a day.
I can't find an SQSOperator in Airflow for reading SQS messages, so to create an event-triggered SQS + Airflow workflow, my best guess is to set up a Lambda for triggering Airflow DAGs via the REST API, and the DAG itself will start with an SQSSensor which reads all messages on the queue, and other tasks after that read and process the values from the XCom created by the SQSSensor task. The schedule_interval of the DAG can be set to None since it will be triggered via the REST API.

Message -> sqs vs Message -> sns -> sqs

I have a task generator to generate task messages to SQS queue and a bunch of workers to poll the SQS queue to process the task. In this case, is there any benefit to let the task generator to publish messages to a SNS topic first, and then the SQS queue subscribes to the SNS topic? I assume directly publish to SQS queue is enough.
Assuming you're not needing to fan out the messages to different types of workers, and your workers are doing the same job then no you don't.
Each worker can take and process one message.
One item to be aware off is the timeouts before the messages become visable on SQS again. i.e. not configuring the timeouts correctly could cause another worker to process the same message.
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-visibility-timeout.html
When a consumer receives and processes a message from a queue, the
message remains in the queue. Amazon SQS doesn't automatically delete
the message. Because Amazon SQS is a distributed system, there's no
guarantee that the consumer actually receives the message (for
example, due to a connectivity issue, or due to an issue in the
consumer application). Thus, the consumer must delete the message from
the queue after receiving and processing it. Visibility Timeout
Immediately after a message is received, it remains in the queue. To
prevent other consumers from processing the message again, Amazon SQS
sets a visibility timeout, a period of time during which Amazon SQS
prevents other consumers from receiving and processing the message.
The default visibility timeout for a message is 30 seconds. The
minimum is 0 seconds. The maximum is 12 hours. For information about
configuring visibility timeout for a queue using the console

How to prevent AWS Lambda from deleting message from SQS queue automatically and instead delete it programmatically?

When a file is added to my S3 bucket an S3PUT Event is triggered which puts a message into SQS. I've configured a Lambda to be triggered as soon as a message is available.
In the lambda function, I'm sending an API request to run a task on an ECS Fargate container with environment variables containing the message received from SQS. In the container I'm using the message to download the file from S3, do processing and on successful processing I wish to delete the message from SQS.
However the message gets deleted from SQS automatically after my lambda executes.
Is there any way that I can configure the lambda not to automatically delete the SQS message (other than raising an exception and failing the lambda purposely), so that I can programmatically delete the message from my container?
Update:
Consider this scenario which I wish to achieve.
Message enters SQS queue
Lambda takes the message & runs ECS API and finishes without deleting the msg from queue.
Msg is in-flight.
ECS container runs the task and deletes msg from queue on successful processing.
If container fails, after the visibility timeout the message will re-enter the queue and the lambda will be triggered again and the cycle will repeat from step 1.
If container fails more than a certain number of times, only then will message go from in-flight to DLQ.
This all currently works only if I purposely raise an exception on the lambda and I'm looking for a similar solution without doing this.
The behaviour is intended and as long as SQS is configured as a Lambda trigger, once the function returns (i.e. completes execution) the message is automatically deleted.
The way I see it, to achieve the behaviour you're describing you have 4 options:
Remove SQS as Lambda trigger and instead execute the Lambda Function on a schedule and poll the queue yourself. The lambda will read messages that are available but unless you delete them explicitly they will become available again once their visibility timeout is expired. You can achieve this with a CloudWatch schedule.
Remove SQS as Lambda trigger and instead execute the Lambda Function explicitly. Similar to the above but instead of executing on a schedule all the time, the Lambda function could be triggered by the producer of the message itself.
Keep the SQS Lambda trigger and store the message in an alternative SQS Queue (as suggested by #jarmod in a comment above).
Configure the producer of the message to publish a message to an SNS Topic and subscribe 2 SQS Queue to this topic. One of the two queues will trigger a Lambda Function, the other one will be used by your ECS tasks.
Update
Based on the new info provided, you have another option:
Leave the event flow as it is and let the message in the SQS be deleted by Lambda. Then, in your ECS Task, handle the failure state and put a new message in the SQS with the same payload/body. This will allow you to retry indefinitely.
There's no reason why the SQS message has to be the exact same, what you're interested is the body/payload.
You might want to consider adding a mechanism to set a limit to these retries and post a message to a DLQ.
One solution I can think of is: remove lambda triggered by the sqs queue, create an alarm that on sqs queue. When the alarm triggers, scale out the ecs task. When there's no item in the queue, scale down the ecs task. Let the ecs task just poll the queue and handle all the messages.

Triggering Cron job on AWS manually through sending a message from SQS

I setup cron job on Elasticbeanstalk using cron.yaml file and sqs run my tasks periodically. Is there a way to trigger a cron job manually through sqs platform so that for not-frequently running tasks I can easily test the results without waiting for the schedule itself? I tried to send a message to sqs queue attached to eb instance but can't set the http headers required for cronjob.

AWS SQS queue improve performance

I tried implementing an AWS SQS Queue to minimise the database interaction from the backend server, but I am having issues with it.
I have one consumer process that looks for messages from one SQS queue.
A JSON message is placed in the SQS queue when Clients click on a button in a web interface.
A backend job in the app server picks up the JSON message from the SQS queue, deletes the message from the queue and processes it.
To test the functionality, I implemented the logic for one client. It was running fine. However, when I added 3 more clients it was not working properly. I was able to see that the SQS queue was stuck up with 500 messages and the backend job was working properly reading from the queue.
Do I need to increase the number of backend jobs or increase the number of client SQS queues? Right now all the clients send the message to same queue.
How do I calculate the number of backend jobs required? Also, is there any setting to make SQS work faster?
Having messages stored in a queue is good - in fact, that's the purpose of using a queue.
If your backend systems cannot consume messages at the rate that they are produced, the queue will act as a buffer to retain the messages until they can be processed. A good example is this AWS re:Invent presentation where a queue is shown with more than 200 million messages: Building Elastic, High-Performance Systems with Amazon SQS and Amazon SNS
If it is important to process the messages quickly, then scale your consumers to match the rate of message production (or faster, so you can consume backlog).
You mention that your process "picks up the JSON message from the SQS queue, deletes the message from the queue and processes it". Please note that best practice is to receive a message from the queue, process it and then delete it (after it is fully processed). This way, if your process fails, the message will automatically reappear on the queue after a defined invisibility period. This makes your application more resilient to failure.