fetching all the messages from AWS FIFO SQS - python-2.7

I am trying fetch messages from FIFO sqs queue. Here is the sample code:
import boto3
sqs_client = boto3.resource(
'sqs',
#aws_access_key_id=AWS_ACCESS_KEY,
#aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
region_name='us-east-2'
)
queue_name = 'test_queue.fifo'
response = sqs_client.create_queue(
QueueName=queue_name,
Attributes={
'FifoQueue': 'true',
'ContentBasedDeduplication': 'true'
}
)
for i in range(0,50):
status = response.send_message(MessageBody = 'This is test message #'+str(i), MessageGroupId='586474de88e03')
while True:
messages = response.receive_messages(MaxNumberOfMessages=10)
if len(messages)>0:
for message in messages:
print message.body
else:
print('Queue is now empty')
break
but what I am getting is only the first 10 messages and then its showing "Queue is now empty" although I can see there are 40 available messages in the queue from AWS console.
So here I want to fetch all the messages from the queue in loop. Any lead would be appreciated. Thanks.

When there is a small number of messages in an SQS queue, especially an extremely small number as in your case, you may not get any messages returned and may need to retry the call:
Short poll is the default behavior where a weighted random set of machines is sampled on a receive-message call. Thus, only the messages on the sampled machines are returned. If the number of messages in the queue is small (fewer than 1,000), you most likely get fewer messages than you requested per receive-message call. If the number of messages in the queue is extremely small, you might not receive any messages in a particular receive-message response. If this happens, repeat the request.
Also, generally speaking, once you receive a set of messages, you process them and then delete the messages that you processed - for testing purpose at least you may want to delete each returned message after each 'print message.body', and before you make the next receive request.

Your Question :I want to fetch all the messages from the queue in loop.............. My answer :(read it completely) for fifo queue . Read that message then send that same message back to that queue and delete it from the queue .... It would be safe only if u do so(by proper exceptions hadlling and Message handler) . Try writing python programs with proper loggers and make it fail safe . Actually ur your is not fail safe .

Related

QueueTrigger : Mesage with encoding error doesnt push message to poison queue

I have webjob queue trigger which is responding to queue message and it works fine. However sometimes we push messages manually in queue and if there is manual mistake which causes DecoderFallBackException. But the strange behavior is that looks like webjob keeps trying unlimited times and our AI logs are creating a mess. I tried restarting webjob to see if it clears any internal cache but doesn’t help.
only thing which helps is deleting queue
Ideally any exception beyond deque count should move message to poison queue.
I've tried to reproduce your issue on my side, but it works well. First I create a backend demo to insert invalid byte message in queue that could cause DecoderFallBackException
Encoding ae = Encoding.GetEncoding(
"us-ascii",
new EncoderExceptionFallback(),
new DecoderExceptionFallback());
string inputString = "XYZ";
byte[] encodedBytes = new byte[ae.GetByteCount(inputString)];
ae.GetBytes(inputString, 0, inputString.Length,
encodedBytes, 0);
//make the byte invalid
encodedBytes[0] = 0xFF;
encodedBytes[2] = 0xFF;
CloudQueueMessage message = new CloudQueueMessage(encodedBytes);
queue.AddMessage(message);
Web Job code:
public static void ProcessQueueMessage([QueueTrigger("queue")] string message, TextWriter log)
{
log.WriteLine(message);
}
After 5 times the exception occurs, the message is moved to 'queue-poison'. This is the expected behavior. Check here for details:
maxDequeueCount 5 The number of times to try processing a message before moving it to the poison queue.
You may check if you accidently set "maxDequeueCount" to bigger value. If not, please provide your webjob code and how you find DecoderFallBackException for us to investigate.

AWS SQS FIFO ApproximateNumberOfMessages

I have 2 FIFO queues (with priority).
I need to process only one message at a time.
I can't just get messages from high queue - if any - process, if noone - get from low queue. I need to understant, if there are any messages in flight.
I need to check if there are any messages in flight and from which queue to check.
So i load ApproximateNumberOfMessages and ApproximateNumberOfMessagesNotVisible for each queue and analyze them.
This is my part of code
if (highPriorityQueueState.MessagesInFlight || lowPriorityQueueState.MessagesInFlight) {
return Promise.resolve();
}
if (highPriorityQueueState.MessagesInQueue) {
return _receiveMessageFromQueue(highPriorityQueueUrl);
}
if (lowPriorityQueueState.MessagesInQueue) {
return _receiveMessageFromQueue(lowPriorityQueueUrl);
}
return Promise.resolve();
The Question is: can i rely on this logic? The Amazon Documentation ( http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-resources-required-process-messages.html ) says that for FIFO queues, these values are exact. But i'm not sure. Maybe anyone know how to process 2 queues with only exactly one message in flight?

How to limit an Akka Stream to execute and send down one message only once per second?

I have an Akka Stream and I want the stream to send messages down stream approximately every second.
I tried two ways to solve this problem, the first way was to make the producer at the start of the stream only send messages once every second when a Continue messages comes into this actor.
// When receive a Continue message in a ActorPublisher
// do work then...
if (totalDemand > 0) {
import scala.concurrent.duration._
context.system.scheduler.scheduleOnce(1 second, self, Continue)
}
This works for a short while then a flood of Continue messages appear in the ActorPublisher actor, I assume (guess but not sure) from downstream via back-pressure requesting messages as the downstream can consume fast but the upstream is not producing at a fast rate. So this method failed.
The other way I tried was via backpressure control, I used a MaxInFlightRequestStrategy on the ActorSubscriber at the end of the stream to limit the number of messages to 1 per second. This works but messages coming in come in at approximately three or so at a time, not just one at a time. It seems the backpressure control doesn't immediately change the rate of messages coming in OR messages were already queued in the stream and waiting to be processed.
So the problem is, how can I have an Akka Stream which can process one message only per second?
I discovered that MaxInFlightRequestStrategy is a valid way to do it but I should set the batch size to 1, its batch size is default 5, which was causing the problem I found. Also its an over-complicated way to solve the problem now that I am looking at the submitted answer here.
You can either put your elements through the throttling flow, which will back pressure a fast source, or you can use combination of tick and zip.
The first solution would be like this:
val veryFastSource =
Source.fromIterator(() => Iterator.continually(Random.nextLong() % 10000))
val throttlingFlow = Flow[Long].throttle(
// how many elements do you allow
elements = 1,
// in what unit of time
per = 1.second,
maximumBurst = 0,
// you can also set this to Enforcing, but then your
// stream will collapse if exceeding the number of elements / s
mode = ThrottleMode.Shaping
)
veryFastSource.via(throttlingFlow).runWith(Sink.foreach(println))
The second solution would be like this:
val veryFastSource =
Source.fromIterator(() => Iterator.continually(Random.nextLong() % 10000))
val tickingSource = Source.tick(1.second, 1.second, 0)
veryFastSource.zip(tickingSource).map(_._1).runWith(Sink.foreach(println))

How can I check if a message is about to pass the MessageRetentionPeriod?

I have an app that uses SQS to queue jobs. Ideally I want every job to be completed, but some are going to fail. Sometimes re-running them will work, and sometimes they will just keep failing until the retention period is reached. . I want to keep failing jobs in the queue as long as possible, to give them the maximum possible chance of success, so I don't want to set a maxReceiveCount. But I do want to detect when a job reaches the MessageRetentionPeriod limit, as I need to send an alert when a job fails completely. Currently I have the max retention at 14 days, but some jobs will still not be completed by then.
Is there a way to detect when a job is about to expire, and from there send it to a deadletter queue for additional processing?
Before you follow my advice below and assuming I've done the math for periods correctly, you will be better off enabling a redrive policy on the queue if you check for messages less often than every 20 minutes and 9 seconds.
SQS's "redrive policy" allows you to migrates messages to a dead letter queue after a threshold number of receives. The maximum receives that AWS allows for this is 1000, and over 14 days that works out to about 20 minutes per receive. (For simplicity, that is assuming that your job never misses an attempt to read queue messages. You can tweak the numbers to build in a tolerance for failure.)
If you check more often than that, you'll want to implement the solution below.
You can check for this "cutoff date" (when the job is about to expire) as you process the messages, and send messages to the deadletter queue if they've passed the time when you've given up on them.
Pseudocode to add to your current routine:
Call GetQueueAttributes to get the count, in seconds, of your queue's Message Retention Period.
Call ReceiveMessage to pull messages off of the queue. Make sure to explicitly request that the SentTimestamp is visible.
Foreach message,
Find your message's expiration time by adding the message retention period to the sent timestamp.
Create your cutoff date by subtracting your desired amount of time from the message's expiration time.
Compare the cutoff date with the current time. If the cutoff date has passed:
Call SendMessage to send your message to the Dead Letter queue.
Call DeleteMessage to remove your message from the queue you are processing.
If the cutoff date has not passed:
Process the job as normal.
Here's an example implementation in Powershell:
$queueUrl = "https://sqs.amazonaws.com/0000/my-queue"
$deadLetterQueueUrl = "https://sqs.amazonaws.com/0000/deadletter"
# Get the message retention period in seconds
$messageRetentionPeriod = (Get-SQSQueueAttribute -AttributeNames "MessageRetentionPeriod" -QueueUrl $queueUrl).Attributes.MessageRetentionPeriod
# Receive messages from our queue.
$queueMessages = #(receive-sqsmessage -QueueUrl $queueUrl -WaitTimeSeconds 5 -AttributeNames SentTimestamp)
foreach($message in $queueMessages)
{
# The sent timestamp is in epoch time.
$sentTimestampUnix = $message.Attributes.SentTimestamp
# For powershell, we need to do some quick conversion to get a DateTime.
$sentTimestamp = ([datetime]'1970-01-01 00:00:00').AddMilliseconds($sentTimestampUnix)
# Get the expiration time by adding the retention period to the sent time.
$expirationTime = $sentTimestamp.AddDays($messageRetentionPeriod / 86400 )
# I want my cutoff date to be one hour before the expiration time.
$cutoffDate = $expirationTime.AddHours(-1)
# Check if the cutoff date has passed.
if((Get-Date) -ge $cutoffDate)
{
# Cutoff Date has passed, move to deadletter queue
Send-SQSMessage -QueueUrl $deadLetterQueueUrl -MessageBody $message.Body
remove-sqsmessage -QueueUrl $queueUrl -ReceiptHandle $message.ReceiptHandle -Force
}
else
{
# Cutoff Date has not passed. Retry job?
}
}
This will add some overhead to every message you process. This also assumes that your message handler will receive the message inbetween the cutoff time and the expiration time. Make sure that your application is polling often enough to receive the message.

RabbitMQ - combine Work Queues and Routing Queues

I am building a system where a Producer sends a list of tasks to be queued which will be consumed by a number of Consumers.
Assume I have a list of tasks and they can be categorised into Black, Orange and Yellow. All the Black tasks are sent to Queue_0, Orange to Queue_1 and Yellow to Queue_2. And I will assign a worker to each queue(i.e: Consumer_0 to Queue_0, Consumer_1 to Queue_1 and Consumer_2 to Queue_2). If Black lists get larger, I want to add an extra Consumer(i.e: Consumer_3) to Queue_0 to aid Consumer_0.
I went through RabbitMQ tutorials on Worker Queues and Routing. I thought Routing will solve my problem. I launched three terminals, a producer and two consumers which will receive Black tasks. When the producer sends a few black tasks(Black_Task_1, Black_Task_2), both consumers received the two messages (i.e: Consumer_0 receives Black_Task_1 and Black_Task_2, Consumer_3 also receives Black_Task_1 and Black_Task_2) . I want my consumers to share the task, not do the same task. Example, Consumer_0 does Black_Task_1 while Consumer_3 does Black_Task_2. What configurations can I achieve that?
=============================
Update
This is a sample code taken from RabbitMQ, routing tutorial. I modified a little. Note that this code doesn't sent Black, Orange or Yellow queues. But the concept is there.
emit_log_direct.py
#!/usr/bin/env python
import pika
import sys
connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
channel = connection.channel()
channel.exchange_declare(exchange='direct_logs',
type='direct')
severity = sys.argv[1] if len(sys.argv) > 1 else 'info'
message = ' '.join(sys.argv[2:]) or 'Hello World!'
channel.basic_publish(exchange='direct_logs',
routing_key=severity,
body=message)
print " [x] Sent %r:%r" % (severity, message)
connection.close()
receive_logs_direct.py
#!/usr/bin/env python
import pika
import sys
import time
connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
channel = connection.channel()
channel.exchange_declare(exchange='direct_logs',
type='direct')
result = channel.queue_declare(exclusive=True)
queue_name = result.method.queue
severities = sys.argv[1:]
if not severities:
print >> sys.stderr, "Usage: %s [info] [warning] [error]" % \
(sys.argv[0],)
sys.exit(1)
for severity in severities:
channel.queue_bind(exchange='direct_logs',
queue=queue_name,
routing_key=severity)
print ' [*] Waiting for logs. To exit press CTRL+C'
def callback(ch, method, properties, body):
print " [x] %r:%r" % (method.routing_key, body,)
time.sleep(1)
print " [x] Done"
ch.basic_ack(delivery_tag=method.delivery_tag)
channel.basic_qos(prefetch_count=1)
channel.basic_consume(callback,
queue=queue_name)
channel.start_consuming()
Producer
nuttynibbles$ ./4_emit_log_direct.py info "run run info"
[x] Sent 'info':'run run info'
Consumer_0
nuttynibbles$ ./4_receive_logs_direct_customize.py info
[*] Waiting for logs. To exit press CTRL+C
[x] 'info':'run run info'
[x] Done
Consumer_3
nuttynibbles$ ./4_receive_logs_direct_customize.py info
[*] Waiting for logs. To exit press CTRL+C
[x] 'info':'run run info'
[x] Done
I think your basic issue is with this:
If Black lists queue get larger, I want to add an extra Consumer(i.e:
Consumer_3) to Queue_0 to aid Consumer_0.
As soon as you add another consumer to the queue - it will pick up the next available message.
If the first consumer does not acknowledge the message; then multiple workers will be able to work on the same message as it will remain on the queue.
So make sure you are correctly acknowledging the messages:
By default, RabbitMQ will send each message to the next consumer, in
sequence. On average every consumer will get the same number of
messages. This way of distributing messages is called round-robin.
[...]
There aren't any message timeouts; RabbitMQ will redeliver the message
only when the worker connection dies. It's fine even if processing a
message takes a very, very long time.
Depending on the nature of the task, you may be able to split the work between multiple processes by creating a priority queue; which is used by C1 (a consumer) to get additional resources. In this case you'll have to have workers ready and listening on the separate priority queue; thus creating a sub-queue where T1 (a task) is being processed.
However, in other to do this, the initial C1 has to make sure the task is no longer available by acknowledging its receipt.
I think that your problem is that you are creating a new Queue for each consumer. When you call
result = channel.queue_declare(exclusive=True)
queue_name = result.method.queue
in your consumer, this declares a new queue, tells RabbitMQ to generate a unique name for it, and marks it for exclusive use by the channel in the consumer that is calling it. That means that each consumer will have its own queue.
You then bind each new Queue to the exchange using the severity as a routing key. When a message comes into a direct Exchange, RabbitMQ will route a copy of it to every Queue that is bound with a matching routing key. There is no round-robin across the Queues. Each consumer will get a copy of the message, which is what you are observing.
I believe what you want to do is have each consumer use the same name for the queue, specify the name in the queue_declare, and don't make it exclusive. Then all the consumers will be listening to the same queue. The messages will be delivered to one of the consumers, basically in a round-robin fashion.
The producer (the emit_log.py program) doesn't declare or bind the queue - it doesn't have to, but if the binding isn't established before the message is sent, it will be discarded. If you are using a fixed queue, you can have the producer set it up as well, just be sure to use the same parameters (e.g. queue_name) as the consumer.