is it possible to know how many times sqs messsage has been read - amazon-web-services

I have a use case to know how many times sqs message has been read in my code.
For example we read message from SQS, for abc reason/exception we cant process that message . Now the same message available in queue to read after visibility timeout.
This will create endless loop. Is there a way to know how many times particular sqs message has been read and returned back to queue.
I am aware this can be handled via dead letter queue. Since that requires more effort I am checking is there any other option
i dont want to retry the message if it fails more than x time and i want to delete it. Is it possible in SQS

You can do this manually by looking at the approximateReceiveCount attribute of your messages, see this question on how to do so. You just need to implement the logic to read the count and decide whether to try processing the message or delete it. Note however that receiveCount is affected by more than just programmatically processing messages: viewing messages in the console will increment it too.
That being said a DLQ is a premade solution for exactly this usecase. It's not a lot of additional work: all you have to do is create another SQS queue, set it as the DLQ of your processing queue, and set the number of retries. Then, the DLQ handles all your redrive logic, and instead of deleting messages after n failures they're moved to the DLQ, where you can manually look at them to understand why they're failing, set metrics alarms on the queue, and if you want manually re-drive the messages into your processing queue. Or just ignore them until they age out of the queue based on its retention policy - the important thing is that the DLQ gives you the option of being able to see which messages failed after the fact, while deleting them outright does not.

When calling ReceiveMessage(), you can specify a list of AttributeNames that you would like returned.
One of these attributes is ApproximateReceiveCount, which returns "the number of times a message has been received across all queues but not deleted".
It is an 'approximate' count due to the highly parallel nature of SQS -- it is possible that the count is slightly off if a message was processed around the same time as this request.

Related

Concurrent processing of user Messages in SQS

We have several consumer processes which poll from a standard SQS and process the message. Each message is associated with a user. For each user, we can process 100 messages per minute. Beyond that, the API which we are using for processing would start giving 500 errors.
Now since the Queue contains messages for other users, we can't cherry-pick those users since they have their quota under the limit.
One solution to this is using FIFO and implementing message groups. But FIFO has a peculiar limitation.
You can have a maximum of 20,000 in-flight messages
This would have been completely fine, but the issue is that when a message is in flight from a message group, SQS adds the count of all the messages in that group to the in-flight count.
This article explains more in detail:
https://tomgregory.com/3-surprising-facts-about-aws-sqs-fifo-queues/#:~:text=A%20FIFO%20queue%20has%20a%20maximum%20inflight%20message%20limit%20of%2020%2C000.
In this article read "20,000 message buffer" header. That might explain what's happening.
https://aws.amazon.com/premiumsupport/knowledge-center/sqs-message-backlog/
The second solution which I could think of is to make the producer of the microservice smart. But in our case, the producer is a completely different microservice. And the owners of that microservice hardly listen.
We definitely want our consumers to scale to provide minimum wait time to each user but can't because of the above reasons.
I genuinely feel SQS was not the correct choice for this design, but can't convince my superiors of the same.
Is there a way we can overcome this situation or did we hit a dead end?
This would have been completely fine, but the issue is that when a
message is in flight from a message group, SQS adds the count of all
the messages in that group to the in-flight count
I do not think this is the case for a FIFO. I am using a FIFO where I process one message at a time per consumer(There are 3 of them). There are SQS messages from the same message group, but the inflight message count for me is always 3, i.e each of the 3 consumers processing one of them. When either of them processes the message, and the processing time for each SQS message is variable here, it picks up the next one in the queue. The inflight messages count remains 3 all the time.

Deleting a message from SQS after certain number of receives regardless of success/failure

I am using SQS queues in two places of my Spring boot application :
In one queue, I would like the messages to be routed to DLQ when maximum numbers of receives for a given message > = 3
For the second case, I don't like to configure a DLQ.
In (1) and (2), however, I would like to delete the message from DLQ and normal queue respectively after 3 times receives.
As of now, I cannot find any such configurations in SQS, that allows me to delete a message from the queue after a certain number of receives.
Maybe, I am missing something. Could anyone please help here?
There is no mechanism for "automated" deletion of messages from SQS queue upon a given number of unsuccessful received, if you don't want to use DLQ.
Without DQL, SQS will keep messages in the queue till they expire. Thus, if you want to do what you wish, you have to create your own solution for that. You have to store number of times the message got received, e.g., in DynamoDB, and then upon third receive, the consumer must explicitly delete the message from the queue.
You can explore sqs message attributes. Once you received the message, delete it from the queue and send it back to the queue with an added message attribute stating how many times you have received the message.
Ref:https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-java-send-message-with-attributes.html

AWS SQS Dead Letter Queue notifications

I'm trying to design a small message processing system based on SQS, Lambda, and SNS. In case of failure, I'd like for the message to be enqueued in a Dead Letter Queue (DLQ) and for a webhook to be called.
I'd like to know what the most canonical or reasonable way of achieving that would look like.
Currently, if everything goes well, the process should be as follows:
SQS (in place to handle retries) enqueues a message
Lambda gets invoked by SQS and processes the message
Lambda sends a webhook and finishes normally
If something in the lambda goes wrong (success webhook cannot be called, task at hand cannot be processed), the easiest way to achieve what I want seems to be to set up a DLQ1 that SQS would put the failed messages in. An auxiliary lambda would then be called to process this message, pass it to SNS, which would call the failure webhook, and also forward the message to DLQ2, the final/true DLQ.
Is that the best approach?
One alternative I know of is Alarms, though I've been warned that they are quite tricky. Another one would be to have lambda call the error reporting webhook if there's a failure on the last retry, although that somehow seems inappropriate.
Thanks!
Your architecture looks good enough in case of success, but I personally find it quite confusing if anything goes wrong as I don't see why you need two DLQs to begin with.
Here's what I would do in case of failure:
Define a DLQ on your source SQS Queue and set the maxReceiveCount to e.g. 3, meaning if messages fail three times, they will be redirected to the configured DLQ
Create a Lambda that listens to this DLQ.
Execute the webhook inside this Lambda.
Since step 3 automatically deletes the message from the Queue once it has been processed and, apparently, you want the messages to be persisted somewhere, store the content of the message in a file on S3 and store the file metadata (bucket and key) in a table in DynamoDB, so you can always query for failed messages.
I don't see any role for SNS here unless you want multiple subscribers for a given message, but as I see this is not the case.
This way, you need need to maintain only one DLQ and you can get rid of SNS as it's only adding an extra layer of complexity to your architecture.

AWS SQS FIFO - How to get more than 10 messages at a time?

Currently we want to pull down an entire FIFO queue, and process the contents, and if any issues, release messages back into the queue.
The problem is, that currently AWS only gives us 10 messages, and won't give us 10 more (which is the way you get bulk messages in SQS, multiple 10 max message requests) until we delete or release the first 10.
We need to get more than 10 though. Is this not possible? We understand we can set the group_id to a random string, and that allows processing more, but then the order isn't guaranteed, which defeats the purpose of FIFO.
I managed to reproduce your results -- I could retrieve 10 messages, but then running the same command again would not return another set of messages.
The relevant documentation seems to be:
While messages with a particular MessageGroupId are invisible, no more messages belonging to the same MessageGroupId are returned until the visibility timeout expires. You can still receive messages with another MessageGroupId as long as it is also visible.
I suspect (just a theory!) that this is to preserve the ordering of messages... If a client asked for a set of messages and they are still being processed, there is the chance that the messages might be returned to the queue. Therefore, no further messages are provided until the original messages are deleted or pass their visibility timeout.
This is only a behaviour of FIFO queues.
It seems that you will need to receive and delete all messages to be able to access them all. I would suggest:
Receive one (or more) message.
Process it. If everything worked, delete the message.
If there were problems, push the message to a new queue.
Once the queue is empty, you would need to read from the new queue and send them back to the original queue (which should preserve ordering).
If you frequently require more capabilities that Amazon SQS provides, you could consider using Amazon MQ – Managed message broker service for ActiveMQ. It has many more capabilities (but is accordingly less 'simple').
If you set another MessageGroupId, you can get another 10 messages, even you don't release or delete the previous ones.
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/using-messagegroupid-property.html

What if my lambda job, which is subscribed to an AWS SNS topic, goes down or stops working?

I have one publisher and one subscriber for my SNS topic in AWS.
Suppose my subscriber is getting failed and exiting with a failure.
Will SNS repush those failed messages?
If not...
Is there another way to achieve that goal where my system starts processing from the last successful lambda execution?
There is a retry policy, but if your application already received the message, then no. If something goes wrong you won't see it again and since Lambdas don't carry state...You could be in trouble.
I might consider looking at SQS instead of SNS. Remember, messages in SQS won't be removed until you remove them and you can set a window of invisibility. Therefore, you can easily ensure the next Lambda execution picks up where things left off (depending on your settings). Each Lambda would then be responsible for removing that message from SQS and that's how you'd know the message was processed.
Without knowing more about your application and needs, I couldn't say for sure...But I would take a look at it. I've built a "taskmaster" Lambda before that ran on a schedule and read from an SQS queue (multiple queues actually - the scheduled job passed different JSON event based on which queue to read from). It would then pass the job off to the appropriate Lambda "worker" which would then remove that message. Should it stop working...Well, the invisibility period would timeout (and 5 minutes isn't bad here given that's all Lambdas can execute for) and the next Lambda would pick it up. The taskmaster then would run as often as needed and read as many jobs from the queue as necessary. This really helps you have complete control over at what rate you are processing things, how many times you are retrying things, etc. Then you can also make use of a dead-letter queue to catch anything that may have failed (also, think about sticking things back into the queue).
You have a LOT of flexibility with SQS that I'm not really sure you get with SNS to be honest. I was never fond of SNS, though it too has a place and time and so again without knowing more here I couldn't say if SQS would be the fit for you...But I think your concerns can be taken care of with SQS if it makes sense for your application.