Getting an SQS message state in boto - amazon-web-services

I have a SQS producer and lots of consumers. It would be very helpful to know if the producer can tell if a particular message has been deleted by a consumer or not. Is there a way of doing this? I'm currently using boto 2.6.0.

As far as I know, SQS does not provide any mechanism to be notified when a message is deleted. So, I think if you want to know when messages are deleted, you will have to keep track of that separately by keeping a database of message ids and having consumers tell you the message id of any messages they have deleted.

As far as I know you can't track an message in a Queue. Depending on your goal you could try the following things:
Monitoring
Write the results of a job to a logfile, and maybe use something like logstash with Kibana. If you get creative you might even fire things straight into something like ElasticSearch or SimpleDB.
Callback
The receivers could fire any kind of "callback" to the processor or any other process updating a certain message state in for example a database or a cache.
You have to keep in mind that this means that while you scale up your receivers, your processor has to scale up as well. Also keep an eye on your indexes, make sure your status update "write" is fast.

Related

is it possible to know how many times sqs messsage has been read

I have a use case to know how many times sqs message has been read in my code.
For example we read message from SQS, for abc reason/exception we cant process that message . Now the same message available in queue to read after visibility timeout.
This will create endless loop. Is there a way to know how many times particular sqs message has been read and returned back to queue.
I am aware this can be handled via dead letter queue. Since that requires more effort I am checking is there any other option
i dont want to retry the message if it fails more than x time and i want to delete it. Is it possible in SQS
You can do this manually by looking at the approximateReceiveCount attribute of your messages, see this question on how to do so. You just need to implement the logic to read the count and decide whether to try processing the message or delete it. Note however that receiveCount is affected by more than just programmatically processing messages: viewing messages in the console will increment it too.
That being said a DLQ is a premade solution for exactly this usecase. It's not a lot of additional work: all you have to do is create another SQS queue, set it as the DLQ of your processing queue, and set the number of retries. Then, the DLQ handles all your redrive logic, and instead of deleting messages after n failures they're moved to the DLQ, where you can manually look at them to understand why they're failing, set metrics alarms on the queue, and if you want manually re-drive the messages into your processing queue. Or just ignore them until they age out of the queue based on its retention policy - the important thing is that the DLQ gives you the option of being able to see which messages failed after the fact, while deleting them outright does not.
When calling ReceiveMessage(), you can specify a list of AttributeNames that you would like returned.
One of these attributes is ApproximateReceiveCount, which returns "the number of times a message has been received across all queues but not deleted".
It is an 'approximate' count due to the highly parallel nature of SQS -- it is possible that the count is slightly off if a message was processed around the same time as this request.

Is it possible to selectively read from AWS SQS?

I have a use-case. I want to read from SQS always, except when another event happens.
For instance, I have football news into SQS as messages. I want to retrieve them always, except for times when live matches are happening.
Is there any possibility to read unless there is another event does the job?
I scrolled the docs and Stack Overflow, but I don't see a solution.
COMMENT: I have a small and week service, and I cannot because of technical limitations increase it (memory/CPU, etc.), but I still want 2 "conflicting" flows to be in the service. They are both supposed to communicate to the same API, and I don't want them to send conflicting requests.
Is there a way to do it, or will I have to write a custom communicator with SQS?
You can't select which messages you want to read from SQS and which you'd rather not - there is no filtering in SQS.
If you have messages that need to be processed at all times and others that need to be processed only sometimes or in batches, you should put them in separate queues and read from the seperately.
You don't say anything about the infrastructure that reads from the queue, but if it's a process on EC2, you could just stop it while live matches are happening and restart it later. SQS is built for asynchronous messaging and will store the messages for up to 14 days (depending on your configuration) until a consumer is available to read them.

Selecting message queue approach for multiple consumers in AWS

Please help selecting a MQ app/system/approach for the following use-case:
Check for incoming messages for a specific user -> read the message if available -> delete from the queue, ideally, staying within AWS.
Context:
Social networking app, users receiving messages, i.e.
I need to identify incoming messages by recipient ID.
The app is doing long-polls for new messages every 30 seconds.
Message size is <1Kb.
As per current estimates, I'll need 100M+ message checks per months in total (however, much less messages, these are just checks).
While users acknowledge messages choosing OK or Ignore, however not sure if ACK support is required from MQ system for that.
I'm in AWS. Initially thought of SQS, but the more I read the less it looks like a good match - cannot set message recipient ID in a way to filter by recipient, etc, however maybe I'm wrong.
One of the options I also thought about is to just use DynamoDB's "messages" table, partition key being userId and sort key being a messageId, thus I'll be able to easily query by a user, however concerned with costs.
If possible, I would much more prefer to stay within AWS or at least use SAAS like SQS, as being a 1-person startup I really want to avoid headaches supporting self-hosted system.
Thank you!
D
You are right on both these counts:
SQS won't work, because of the limitation you pointed.
DynamoDB would work, but cost a lot.
I can suggest the following:
Create a Redis cluster, possibly on Amazon ElastiCache.
In it, make one List per user.
Whenever a new message comes, append it to concerned User's list.
To deliver the message, just read from the User's list. Also, flush the queue if needed.
What I am suggesting is very similar to how Twitter manages each User's news-feed and home-feed.
It should also be cheap.

Redeliver unprocessed EventHub messages in IEventProcessor.ProcessEventsAsync

In IEventProcessor.ProcessEventsAsync I want to store events in a persisted store. It's possible this store is unavailable and messages cannot be persisted. How to sign these messages to be redelivered later?
The store may be down only for some hours, but until it's up again every message is affected and cannot be persisted.
I don't think you can mark a particular event to be delivered in eventhub, unlike ServiceBus queue. However, eventhub does provide retention policy and offset for each event, which make possible to reprocess an old event. You can read more in the "checkpointing" section from this document: https://azure.microsoft.com/en-us/documentation/articles/event-hubs-overview/
Adding to Tyler response, i suppose that you could use the some kind of "Poison Message"/Dead letter queue approaches. Event Hub does not have that functionality, but Service Bus Queues do.
Anyway, i think that it should be a programmatic approach, not something inside of the backend.
There is a good article about something else, but approach is alike what i meant:
https://www.dougv.com/2015/07/handling-poison-messages-in-an-azure-service-bus-queue/

SQS/task-queue job retry count strategy?

I'm implementing a task queue with Amazon SQS ( but i guess the question applies to any task-queue ) , where the workers are expected to take different action depending on how many times the job has been re-tried already ( move it to a different queue, increase visibility timeout, send an alert..etc )
What would be the best way to keep track of failed job count? I'd like to avoid having to keep a centralized db for job:retry-count records. Should i look at time spent in the queue instead in a monitoring process? IMO that would be ugly or un-clean at best, iterating over jobs until i find ancient ones..
thanks!
Andras
There is another simpler way. With your message you can request ApproximateReceiveCount information and base your retry logic on that. This way you won't have to keep it in the database and can calculate it from the message itself.
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_ReceiveMessage.html
I've had good success combining SQS with SimpleDB. It is "centralized", but only as much as SQS is.
Every job gets a record in simpleDB and a task in SQS. You can put any information you like in SimpleDB like the job creation time. When a worker pulls a job from the queue it can grab the corresponding record from simpleDB to determine it's history. You can see how old the job is, and you can see how many times it has been attempted. Once you're done, you can add worker data to the SimpleDB record (completion time, outcome, logs, errors, stack-trace, whatever) and acknowledge the message from SQS.
I prefer this method because it helps diagnose faults by providing lots of debug info for failed tasks. It also allows workers to handle the job differently depending on how long the job has been queued, how many failures it's had, etc.
It also gives you the ability to query SimpleDB directly and calculate things like average time per task, percent failure rate, etc.
Amazon just released Simple workflow serice (swf) which you can think of as a more sophisticated/flexible version of GAE Task queues.
It will let you monitor your tasks (with hearbeats), configure retry strategies and create complicated workflows. It looks pretty promising abstracting out task dependencies, scheduling and fault tolerance for tasks (esp. asynchronous ones)
Checkout http://docs.amazonwebservices.com/amazonswf/latest/developerguide/swf-dg-intro-to-swf.html for overview.
SQS stands for "Simple Queue Service" which, in concept is the incorrect name for that service. The first and foremost feature of a "Queue" is FIFO (First in, First out), and SQS lacks that. Just wanting to clarify.
Also, Azure Queue Services lacks that as well. For the best cloud Queue service, use Azure's Service Bus since it's a TRUE Queue concept.