I have a scenario where I am sending a message to a request queue which in turns calls a lambda.
That lambda performs some operation and write the data to a response queue(AWS SQS).
How do I retrieve the specific message from my response queue for which request was sent and pass on that to the next step.
Steps:
I have a Step function
At step 1, I am sending the input data to a request queue ( say ABCQueue)
This queue is trigger to a lambda
The lambda processes, and writes data to response queue( say XYZQueue)
In my step function next step is to receive the message from response queue
How do I maintain the sequence here i.e the data I sent that only should I receive from response queue?
Note: Since there would be multiple requests coming in and each request would be getting completed at different time.
Is there any unique id by which I can check the id which I sent is only received back from response queue?
NOTE: I am integrating this with java
I have tried to use task token while using wait for callback.
But that task token expires once I send success from lambda
And when it goes to next state i.e. receive message the task token is not there and I can't get corresponding message
Related
I have a pretty standard setup of feeding SQS to Lambda. The lambda reads the message and makes a web request to a defined endpoint.
If I encounter an exception during processing of the SQS message that is due to the form of the message then I put the message on a dead letter queue.
If I encounter an error with the web request, I put the message back on the feeding queue to make the HTTP request at a later time.
This seems to work fine, but we just ran into an issue where an HTTP endpoint was down for 4 days and the feeding queue dropped the message. I imagine this has something to do with the retention period setting of the queue.
Questions
Is there a way to know, in the lambda, how many times a message has been replayed?
How did the feeder queue know that the message that was re-enqueued was the same as the one that was originally put on the queue?
I'm currently not explicitly deleting a message off the queue. Not having that, hasn't seemed to cause any issues, no re-processing of messages or anything. Should I be explicitly deleting them?
The normal process would be:
The AWS Lambda function is triggered, with the message(s) passed via the event parameter
If the Lambda function successfully processes the message(s), it should return a 'success' code (200) and the message is automatically removed from the queue
If the Lambda function is unable to process the message, it should return a 'failure' code (eg 400) and Amazon SQS will automatically attempt to re-process the message (unless it has exceeded the retry count)
If the Lambda function fails (eg due to a timeout), Amazon SQS will automatically attempt to re-process the message (unless it has exceeded the retry count)
If a message has exceeded its retry count, Amazon SQS will move the message to the Dead Letter Queue
To answer your questions:
If you wish to take responsibility for these activities yourself, you can use the ApproximateReceiveCount attribute on the message. In the request, it appears that you should add AttributeNames=['ApproximateReceiveCount'], but the documentation is a bit contradictory. You might need to use All instead.
Since you are sending a new message to the queue, Amazon SQS is not aware that it is the same message. The message is not 're-enqueued' since it is a new message.
When your Lambda function returns 'success' (200), the message is being deleted off the queue for you.
You might consider using the standard functionality for retries and Dead Letter Queues rather than implementing that logic yourself.
I am trying to find a document of how requests will be processed if there is some network error between request from Client -> AWS SQS?
Because if one request can contain up to 10 messages, how would it work if there is an http error on the request while the 10 messages are being processed to SQS? Does SQS fail the whole request, and all 10 messages are being re-published? OR some messages get processed, and we will reprocess the whole thing again (which will have some duplication)?
I am trying to determine if I can cut cost with aggregating messages into 1 single request, but wonder if this will hurt idempotency.
So when you're sending messages as batch you'll be processing it one by one so each message will have its unique receipthandle so you've delete each message programmatically after processing it to avoid duplication
My application consists of:
1 Amazon SQS message queue
n workers
The workers have the following logic:
1. Wait for message from SQS queue
2. Perform task described in message
3. Delete the message from the SQS queue
4. Go to (1)
I want each message to be received by only one worker to avoid redundant work.
Is there a mechanism to mark a message as "in progress" using SQS, so that other pollers do not receive it?
Alternatively, is it appropriate to delete the message as soon as it is received?
1. Wait for message from SQS queue
2. Delete the message from the SQS queue
3. Perform task described in message
4. Go to (1)
If I follow this approach, is there a way to recover received but unprocessed messages in case a worker crashes (step (3) fails)?
This question is specific to Spring, which contains all sorts of magic.
An SQS message is considered to be "inflight" after it is received from a queue by a consumer, but not yet deleted from the queue. These messages are not visible to other consumers.
In SQS messaging, a message is considered in "inflight" if:
You the consumer have received it, and
the visibility timeout has not expired and
you have not deleted it.
SQS is designed so that you can call ReceiveMessage and a message is given to you for processing. You have some amount of time (the visibility timeout) to perform the processing on this message. During this "visibility" timeout, if you call ReceiveMessage again, no worker will be returned the message you are currently working with. It is hidden.
Once the visibility timeout expires the message will be able to be returned to future ReceiveMessage calls. This could happen if the consumer fails in some way. If the process is successful, then you can delete the message.
The number of messages that are hidden from ReceiveMessage call is the "inflight" number. Currently a SQS queue is set by default to allow a max of 120,000 messages to be "inflight".
http://docs.amazonwebservices.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/AboutVT.html
There will be ReciepientHandle String which will be sent with the message. It will be having a expiry time based on queue visibility timeout.
You can use the this ReciepientHandle to delete message from queue
For example, a message is sent to a queue at 2016-05-19 23:29:05. And it being received by a consumer at 2016-05-19 23:30:05. After processed, the message is deleted at 2016-05-19 23:30:08. That means it takes 60 seconds from message sent to queue to consumer received it, and it takes 63 seconds from the message born to dead.
This metrics is helpful to monitor how fast consumer can consume messages.
To get lifetime of a message, I have to manually add timestamp attribute to each message when it is sent to queue. Consumer will retrieve the timestamp and calculate the time interval.
Does SQS provide similar API to get lifetime of a message (I checked SQS API and with no luck), or is there any other good idea?
There are attributes on an SQS message which will allow you to measure the time spent by a message in the queue.
Some of the attributes you can use are
SentTimestamp, CreatedTimestamp, LastModifiedTimestamp etc
I was looking at the javascript SDK, and it seems like some attributes are only returned when explicitly requested for
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/SQS.html#receiveMessage-property
For the list of all attributes available within an SQS message please refer to http://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_ReceiveMessage.html
I'm having issues where my SQS Messages are never deleted from the SQS Queue. They are only removed when the lifetime ends, which is 4 days.
So to summarize the app:
Send URL to SQS Queue to wait to be crawled
Send message to Elastic Beanstalk App that crawls the data and store it in database
The script seems to be working in the meaning that it does receive the message, and it does crawl it successfully and store the data successfully in the database. The only issue is that the messages remain in the queue, stuck at "Message Available".
So if I for example load the queue with 800 messages, it will be stuck at ~800 messages for 4 days and then they will all be deleted instantly because of the lifetime value. It seems like a few messages get deleted because the number changes slightly, but a large majority is never removed from the queue.
So question:
Isn't SQS supposed to remove the message as soon as it has been send and received by the script?
Is there a manual way for me to in the script itself, delete the current message? From what I know the message is only sent 1 way. From SQS -> App. So from what I know, I can not do SQS <-> App.
Any ideas?
A web application in a worker environment tier should only listen on
the local host. When the web application in the worker environment
tier returns a 200 OK response to acknowledge that it has received and
successfully processed the request, the daemon sends a DeleteMessage
call to the SQS queue so that the message will be deleted from the
queue. (SQS automatically deletes messages that have been in a queue
for longer than the configured RetentionPeriod.) If the application
returns any response other than 200 OK or there is no response within
the configured InactivityTimeout period, SQS once again makes the
message visible in the queue and available for another attempt at
processing.
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html
So I guess that answers my question. Some messages do not return HTTP 200 and then they are stuck in an infinite loop.
No the messages won't get deleted when you read a Queue Item; it is only hidden for a specific amount of time it is called as Visibility Timeout. The idea behind visibility timeout is to ensure that if there are multiple consumers for a single queue, no two consumer pick the same item and start processing.
The is the change you need to do your app to get the expected behavior
Send URL to SQS Queue to wait to be crawled
Send message to Elastic Beanstalk App that crawl the data and store it in database
On the event of successful crawled status, use the receipt-handle(not the message id) and delete the Queue Item from the Queue.
AWS Documentation - DeleteMessage