AWS SQS FIFO ApproximateNumberOfMessages - amazon-web-services

I have 2 FIFO queues (with priority).
I need to process only one message at a time.
I can't just get messages from high queue - if any - process, if noone - get from low queue. I need to understant, if there are any messages in flight.
I need to check if there are any messages in flight and from which queue to check.
So i load ApproximateNumberOfMessages and ApproximateNumberOfMessagesNotVisible for each queue and analyze them.
This is my part of code
if (highPriorityQueueState.MessagesInFlight || lowPriorityQueueState.MessagesInFlight) {
return Promise.resolve();
}
if (highPriorityQueueState.MessagesInQueue) {
return _receiveMessageFromQueue(highPriorityQueueUrl);
}
if (lowPriorityQueueState.MessagesInQueue) {
return _receiveMessageFromQueue(lowPriorityQueueUrl);
}
return Promise.resolve();
The Question is: can i rely on this logic? The Amazon Documentation ( http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-resources-required-process-messages.html ) says that for FIFO queues, these values are exact. But i'm not sure. Maybe anyone know how to process 2 queues with only exactly one message in flight?

Related

Kafka consumer poll newest message

I am using CppKafka to programming Kafka consumer. I want when my consumer starts, it will only poll new arrival messages (i.e message arrive after the consumer-start-time) instead of messages at consumer offset.
// Construct the configuration
Configuration config = {
{ "metadata.broker.list", "127.0.0.1:9092" },
{ "group.id", "1" },
// Disable auto commit
{ "enable.auto.commit", false },
// Set offest to latest to receive latest message when consumer start working
{ "auto.offset.reset", "latest" },
};
// Create the consumer
Consumer consumer(config);
consumer.set_assignment_callback([](TopicPartitionList& partitions) {
cout << "Got assigned: " << partitions << endl;
});
// Print the revoked partitions on revocation
consumer.set_revocation_callback([](const TopicPartitionList& partitions) {
cout << "Got revoked: " << partitions << endl;
});
string topic_name = "test_topic";
// Subscribe to the topic
consumer.subscribe({ topic_name });
As I understand, the configuration auto.offset.reset set to latest only works if the consumer has no commited offset when it starts reading assigned partition. So my guess here that I should call consumer.poll() without commit, but it feels wrong and I am afraid i will break something along the way. Can anyone show me the right way to achieve my requirement?
If "enable.auto.commit" is set as false and you do not commit offsets in your code, then every time your consumers starts it starts message consumption from the first message in the topic if auto.offset.reset=earliest.
The default for auto.offset.reset is “latest,” which means that lacking a valid offset, the consumer will start reading from the newest records (records that were written after the consumer started running).
Based on your question above it looks like auto.offset.reset=latest should solve your problem.
But if you need a real time based offset you need to apply the time filter in your consumer. That means get the message from the topic compare offset time with either on some custom field in message payload or the meta attribute of the message (ConsumerRecord.timestamp())and do further processing accordingly.
Also refer to this answer Retrieve Timestamp based data from Kafka
use seekToEnd(Collection partitions) method.
Seek to the last offset for each of the given partitions. This function evaluates lazily,seeking to the final offset in all partitions only when poll(long) is called. If no partitions are provided, seek to the final offset for all of the currently assigned partitions.

fetching all the messages from AWS FIFO SQS

I am trying fetch messages from FIFO sqs queue. Here is the sample code:
import boto3
sqs_client = boto3.resource(
'sqs',
#aws_access_key_id=AWS_ACCESS_KEY,
#aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
region_name='us-east-2'
)
queue_name = 'test_queue.fifo'
response = sqs_client.create_queue(
QueueName=queue_name,
Attributes={
'FifoQueue': 'true',
'ContentBasedDeduplication': 'true'
}
)
for i in range(0,50):
status = response.send_message(MessageBody = 'This is test message #'+str(i), MessageGroupId='586474de88e03')
while True:
messages = response.receive_messages(MaxNumberOfMessages=10)
if len(messages)>0:
for message in messages:
print message.body
else:
print('Queue is now empty')
break
but what I am getting is only the first 10 messages and then its showing "Queue is now empty" although I can see there are 40 available messages in the queue from AWS console.
So here I want to fetch all the messages from the queue in loop. Any lead would be appreciated. Thanks.
When there is a small number of messages in an SQS queue, especially an extremely small number as in your case, you may not get any messages returned and may need to retry the call:
Short poll is the default behavior where a weighted random set of machines is sampled on a receive-message call. Thus, only the messages on the sampled machines are returned. If the number of messages in the queue is small (fewer than 1,000), you most likely get fewer messages than you requested per receive-message call. If the number of messages in the queue is extremely small, you might not receive any messages in a particular receive-message response. If this happens, repeat the request.
Also, generally speaking, once you receive a set of messages, you process them and then delete the messages that you processed - for testing purpose at least you may want to delete each returned message after each 'print message.body', and before you make the next receive request.
Your Question :I want to fetch all the messages from the queue in loop.............. My answer :(read it completely) for fifo queue . Read that message then send that same message back to that queue and delete it from the queue .... It would be safe only if u do so(by proper exceptions hadlling and Message handler) . Try writing python programs with proper loggers and make it fail safe . Actually ur your is not fail safe .

How do I use select() and gRPC to create a server?

I need to use gRPC but in a single-threaded application (with additional socket channels). Naively, I'm thinking of using select() and depending on which file descriptor pops, calling gRPC to handle the message. My question is, can someone give me a rough (5-10 lines of code) outline skeleton on what I need to call after the select() pops?
Looking at Google's "hello world" example in the synchronous case implies a thread pool (which I can't use), and in the asynchronous case shows the main loop blocking -- which doesn't work for me because I need to handle other socket operations.
You can't do it, at this point (and probably ever).
One of the big weaknesses of event loops, including direct use of select()/poll() style APIs, is that they aren't composable in any natural way short of direct integration between the two.
We could theoretically add such functionality for Linux -- exporting an epoll_fd with a timerfd which becomes readable if it would be productive to call into a completion queue, but doing so would impose substantial constraints and architectural overhead on the rest of the stack just to support this usecase only on Linux. Everywhere else would require a background thread to manage that fd's readability.
This can be done using a gRPC async service along with grpc::Alarm to send any events that come from select or other polling APIs onto the gRPC completion queue. You can see an example using Epoll and gRPC together in this gist. The important functions are these two:
bool grpc_tick(grpc::ServerCompletionQueue& queue) {
void* tag = nullptr;
bool ok = false;
auto next_status = queue.AsyncNext(&tag, &ok, std::chrono::system_clock::now());
if (next_status == grpc::CompletionQueue::GOT_EVENT) {
if (ok && tag) {
static_cast<RequestProcessor*>(tag)->grpc_queue_tick();
} else {
std::cerr << "Not OK or bad tag: " << ok << "; " << tag << std::endl;
return false;
}
}
return next_status != grpc::CompletionQueue::SHUTDOWN;
}
bool tick_loops(int epoll, grpc::ServerCompletionQueue& queue) {
// Pump epoll events over to gRPC's completion queue.
epoll_event event{0};
while (epoll_wait(epoll, &event, /*maxevents=*/1, /*timeout=*/0)) {
grpc::Alarm alarm;
alarm.Set(&queue, std::chrono::system_clock::now(), event.data.ptr);
if (!grpc_tick(queue)) return false;
}
// Make sure gRPC gets at least 1 tick.
return grpc_tick(queue);
}
Here you can see the tick_loops function repeatedly calls epoll_wait until no more events are returned. For each epoll event, a grpc::Alarm is constructed with the deadline set to right now. After that, the gRPC event loop is immediately pumped with grpc_tick.
Note that the grpc::Alarm instance MUST outlive its time on the completion queue. In a real-world application, the alarm should be somehow attached to the tag (event.data.ptr in this example) so it can be cleaned up in the completion callback.
The gRPC event loop is then pumped again to ensure that any non-epoll events are also processed.
Completion queues are thread safe, so you could also put the epoll pump on one thread and the gRPC pump on another. With this setup you would not need to set the polling timeouts for each to 0 as they are in this example. This would reduce CPU usage by limiting dry cycles of the event loop pumps.

How to limit an Akka Stream to execute and send down one message only once per second?

I have an Akka Stream and I want the stream to send messages down stream approximately every second.
I tried two ways to solve this problem, the first way was to make the producer at the start of the stream only send messages once every second when a Continue messages comes into this actor.
// When receive a Continue message in a ActorPublisher
// do work then...
if (totalDemand > 0) {
import scala.concurrent.duration._
context.system.scheduler.scheduleOnce(1 second, self, Continue)
}
This works for a short while then a flood of Continue messages appear in the ActorPublisher actor, I assume (guess but not sure) from downstream via back-pressure requesting messages as the downstream can consume fast but the upstream is not producing at a fast rate. So this method failed.
The other way I tried was via backpressure control, I used a MaxInFlightRequestStrategy on the ActorSubscriber at the end of the stream to limit the number of messages to 1 per second. This works but messages coming in come in at approximately three or so at a time, not just one at a time. It seems the backpressure control doesn't immediately change the rate of messages coming in OR messages were already queued in the stream and waiting to be processed.
So the problem is, how can I have an Akka Stream which can process one message only per second?
I discovered that MaxInFlightRequestStrategy is a valid way to do it but I should set the batch size to 1, its batch size is default 5, which was causing the problem I found. Also its an over-complicated way to solve the problem now that I am looking at the submitted answer here.
You can either put your elements through the throttling flow, which will back pressure a fast source, or you can use combination of tick and zip.
The first solution would be like this:
val veryFastSource =
Source.fromIterator(() => Iterator.continually(Random.nextLong() % 10000))
val throttlingFlow = Flow[Long].throttle(
// how many elements do you allow
elements = 1,
// in what unit of time
per = 1.second,
maximumBurst = 0,
// you can also set this to Enforcing, but then your
// stream will collapse if exceeding the number of elements / s
mode = ThrottleMode.Shaping
)
veryFastSource.via(throttlingFlow).runWith(Sink.foreach(println))
The second solution would be like this:
val veryFastSource =
Source.fromIterator(() => Iterator.continually(Random.nextLong() % 10000))
val tickingSource = Source.tick(1.second, 1.second, 0)
veryFastSource.zip(tickingSource).map(_._1).runWith(Sink.foreach(println))

How can I check if a message is about to pass the MessageRetentionPeriod?

I have an app that uses SQS to queue jobs. Ideally I want every job to be completed, but some are going to fail. Sometimes re-running them will work, and sometimes they will just keep failing until the retention period is reached. . I want to keep failing jobs in the queue as long as possible, to give them the maximum possible chance of success, so I don't want to set a maxReceiveCount. But I do want to detect when a job reaches the MessageRetentionPeriod limit, as I need to send an alert when a job fails completely. Currently I have the max retention at 14 days, but some jobs will still not be completed by then.
Is there a way to detect when a job is about to expire, and from there send it to a deadletter queue for additional processing?
Before you follow my advice below and assuming I've done the math for periods correctly, you will be better off enabling a redrive policy on the queue if you check for messages less often than every 20 minutes and 9 seconds.
SQS's "redrive policy" allows you to migrates messages to a dead letter queue after a threshold number of receives. The maximum receives that AWS allows for this is 1000, and over 14 days that works out to about 20 minutes per receive. (For simplicity, that is assuming that your job never misses an attempt to read queue messages. You can tweak the numbers to build in a tolerance for failure.)
If you check more often than that, you'll want to implement the solution below.
You can check for this "cutoff date" (when the job is about to expire) as you process the messages, and send messages to the deadletter queue if they've passed the time when you've given up on them.
Pseudocode to add to your current routine:
Call GetQueueAttributes to get the count, in seconds, of your queue's Message Retention Period.
Call ReceiveMessage to pull messages off of the queue. Make sure to explicitly request that the SentTimestamp is visible.
Foreach message,
Find your message's expiration time by adding the message retention period to the sent timestamp.
Create your cutoff date by subtracting your desired amount of time from the message's expiration time.
Compare the cutoff date with the current time. If the cutoff date has passed:
Call SendMessage to send your message to the Dead Letter queue.
Call DeleteMessage to remove your message from the queue you are processing.
If the cutoff date has not passed:
Process the job as normal.
Here's an example implementation in Powershell:
$queueUrl = "https://sqs.amazonaws.com/0000/my-queue"
$deadLetterQueueUrl = "https://sqs.amazonaws.com/0000/deadletter"
# Get the message retention period in seconds
$messageRetentionPeriod = (Get-SQSQueueAttribute -AttributeNames "MessageRetentionPeriod" -QueueUrl $queueUrl).Attributes.MessageRetentionPeriod
# Receive messages from our queue.
$queueMessages = #(receive-sqsmessage -QueueUrl $queueUrl -WaitTimeSeconds 5 -AttributeNames SentTimestamp)
foreach($message in $queueMessages)
{
# The sent timestamp is in epoch time.
$sentTimestampUnix = $message.Attributes.SentTimestamp
# For powershell, we need to do some quick conversion to get a DateTime.
$sentTimestamp = ([datetime]'1970-01-01 00:00:00').AddMilliseconds($sentTimestampUnix)
# Get the expiration time by adding the retention period to the sent time.
$expirationTime = $sentTimestamp.AddDays($messageRetentionPeriod / 86400 )
# I want my cutoff date to be one hour before the expiration time.
$cutoffDate = $expirationTime.AddHours(-1)
# Check if the cutoff date has passed.
if((Get-Date) -ge $cutoffDate)
{
# Cutoff Date has passed, move to deadletter queue
Send-SQSMessage -QueueUrl $deadLetterQueueUrl -MessageBody $message.Body
remove-sqsmessage -QueueUrl $queueUrl -ReceiptHandle $message.ReceiptHandle -Force
}
else
{
# Cutoff Date has not passed. Retry job?
}
}
This will add some overhead to every message you process. This also assumes that your message handler will receive the message inbetween the cutoff time and the expiration time. Make sure that your application is polling often enough to receive the message.