Handling exceptions using AWS SNS Topic and AWS SQS queue - amazon-web-services

We have implemented an event driven architecture application which has around 7 spring boot microservices. As part of the happy flow,these microservices listen to AWS SQS(suppose app_queue) which is subscribed to AWS SNS topic(suppose app_topic).
For exception scenarios we have implemented something like below:
Categorised the exceptions as 500(int server errors) and 400(bad req) category errors
In both these scenarios, we are dropping a message to the SNS topic named status_topic. A SQS named status_queue is subscribed to this topic. We did this with a view in mind that once the end to end application development is done for happy scenarios, we will handle the messages in the status_queue in such a way that Production support team has a way to remediate both 500 or 400 errors.
SO basically : app_topic --> app_queue --> Microservice Error --> status_topic --> status_sqs
Needed some expert advice on the below approach if there would be any issues:
A k8s cron job(Spring boot microservice) would come up every night to handle the error messages from the above status_queue and handle only 500 int server errors FOR RETRY.
The cron job will push the messages back to the normal SNS app_topic which will retrigger the message and normal business flow will continue.
Is the above mentioned approach acceptable??
I know that DLQs are more suitable for such scenarios, but can I drop a message to a DLQ directly from my java app code?
Regardless of any approach we take, is there a way to automatially replay messages from a specific queue(normal and dlq) so we dont have to write a seperate microservice to replay messages?

Related

AWS Lambda Custom Event Chatbot to Slack Integration

Before I waste to much time on this I was wondering is it technically possible to send from a Lambda a custom event to Event Bridge to SNS to Chatbot to Slack.
I have written all the infrastructure and I know that it works for non custom messages. So if I have a message with a source of aws.lambda in the rule then when I deploy the Lambda I get the eventual Slack notification.
However if I change the source to a custom source in the rule and use that in the code of the Lambda I get from the SDK call success but no Slack message. From turning on the Chatbot logging I get the following message Event received is not supported (see https://docs.aws.amazon.com/chatbot/latest/adminguide/related-services.html )
I am sort of hoping against hope that I am not sending something in on the SDK put events call that this integration although the api call only offers a limited amount of what you can change.
I did notice that the message sent to Slack from a standard event is much bigger that the one sent as a custom event.
Realistically its just looking that the Chatbox Slack integration is an extremely limited one confined to standard events on a subset of services.
Can someone confirm if this is possible or am I right in my conclusion about the limitations of the integration.

Move Google Pub/Sub Messages Between Topics

How can I bulk move messages from one topic to another in GCP Pub/Sub?
I am aware of the Dataflow templates that provide this, however unfortunately restrictions do not allow me to use Dataflow API.
Any suggestions on ad-hoc movement of messages between topics (besides one-by-one copy and pasting?)
Specifically, the use case is for moving messages in a deadletter topic back into the original topic for reprocessing.
You can't use snapshots, because snapshots can be applied only on subscriptions of the same topics (to avoid message ID overlapping).
The easiest way is to write a function that pull your subscription. Here, how I will do it:
Create a topic (named, for example, "transfer-topic") with a push subscription. Set the timeout to 10 minutes
Create a Cloud Functions HTTP triggered by PubSub push subscription (or a CLoud Run service). When you deploy it, set the timeout to 9 minutes for Cloud Function and to 10 minutes for Cloud Run. The content of the processing is the following
Read a chunk of messages (for examples 1000) from the deadletter pull subscription
Publish the messages (in bulk mode) into the initial topic
Acknowledge the messages of the dead letter subscription
Repeat this up to the pull subscription is empty
Return code 200.
The global process:
Publish a message in the transfer-topic
The message trigger the function/cloud run with a push HTTP
The process pull the messages and republish them into the initial topic
If the timeout is reached, the function crash and PubSub perform a retry of the HTTP request (according with an exponential backoff).
If all the message are processed, the HTTP 200 response code is returned and the process stopped (and the message into the transfer-topic subscription is acked)
this process allow you to process a very large amount of message without being worried about the timeout.
I suggest that you use a Python script for that.
You can use the PubSub CLI to read the messages and publish to another topic like below:
from google.cloud import pubsub
from google.cloud.pubsub import types
# Defining parameters
PROJECT = "<your_project_id>"
SUBSCRIPTION = "<your_current_subscription_name>"
NEW_TOPIC = "projects/<your_project_id>/topics/<your_new_topic_name>"
# Creating clients for publishing and subscribing. Adjust the max_messages for your purpose
subscriber = pubsub.SubscriberClient()
publisher = pubsub.PublisherClient(
batch_settings=types.BatchSettings(max_messages=500),
)
# Get your messages. Adjust the max_messages for your purpose
subscription_path = subscriber.subscription_path(PROJECT, SUBSCRIPTION)
response = subscriber.pull(subscription_path, max_messages=500)
# Publish your messages to the new topic
for msg in response.received_messages:
publisher.publish(NEW_TOPIC, msg.message.data)
# Ack the old subscription if necessary
ack_ids = [msg.ack_id for msg in response.received_messages]
subscriber.acknowledge(subscription_path, ack_ids)
Before running this code you will need to install the PubSub CLI in your Python environment. You can do that running pip install google-cloud-pubsub
An approach to execute your code is using Cloud Functions. If you decide to use it, pay attention in two points:
The maximum time that you function can take to run is 9 minutes. If this timeout get exceeded, your function will terminate without finishing the job.
In Cloud Functions you can just put google-cloud-pubsub in a new line of your requirements file instead of running a pip command.

How to load testing of SNS-->SQS-->LAMBDA

I have a solution
SNS-->SQS-->LAMBDA-->ES(ElastciSearch)
I want to test this with heavy load like 10K or 5K request to SNS per second.
The size of the test record can be very small (1kb) and any type of json record .
Is there anyway to test this load ?I did find anything which is native to AWS for this test .
You could try with jmeter. JMeter has support for testing JMS interfaces for messaging systems. You can use the AWS Java SDK to get a SNS JMS interface
Agree, you can use JMeter to execute load testing over SNS. Create Java Request sampler class using AWS SDK library to publish messages in SNS topic, build a jar and install it under lib/ext.
https://github.com/JoseLuisSR/awsmeter
In this repository you cand find Java Request sampler classes created to publish messages in Standard Topic or FIFO Topic, depends of the kind of Topic you need use other message properties like deduplication id or group id for FIFO topic.
Here you can find details to subscribe SQS queue to SNS topic.

Event Driven MessageBus architecture with AWS SNS: one or many message buses/ lambda action functions

I am implementing a process in my AWS based hosting business with an event driven architecture on AWS SNS. This is largely a learning experience with a new architecture, programming and hosting paradigm for me.
I have considered AWS Step functions, but have decided to implement a Message Bus with AWS SNS topic(s), because I want to understand the underlying event driven programming model.
Nearly all actions are performed by lambda functions and steps are coupled via SNS and/or SQS.
I am undecided if to implement the process with one or many SNS topics and if I should subscribe the core logic to the message bus(es) with one or many lambda functions.
One or many message buses
My core process currently consist of 9 events which of which 2 sets of 2 can be parallel, the remaining 4 are sequential. Subscribing these all to the same message bus is easier to set up, but requires each lambda function to check if the message is relevant to it, which seems like a waste of resources.
On the other hand I could have 6 message buses and be sure that a notified resource has something to do with the message.
One or many lambda functions
If all lambda functions are subscribed to the same message bus, it may be easier to package them all up with a dispatcher function in a single lambda function. It would also reduce the amount of code to upload to lambda, albeit I don't have to pay for that.
On the other hand I would loose the ability to control the timeout for the lambda function and any changes to the order of events is now dependent on the dispatcher code.
I would still have the ability to scale each process part, as any parts that contain repeating elements are seperated by SQS queues.
You should always emit each type of message to it's own topic, as this allows other services to consume these events without tightly coupling the two services.
Likewise, each worker that wants to consume messages should have it's own queue with it's own subscription to the topic.
Doing the following allows you to add new message consumers for a given event without having to modify the upstream service. Furthermore, responsibility over each component is clear - the service producing messages to a topic owns that topic (and the message format), whereas the consumer owns its queue and event handling semantics.
Your consumer can specify a message filter when subscribing to a topic, so it can only receive messages it cares about (documentation).
For example, a process that sends a customer survey after the customer has received their order would subscribe its queue to the Order Status Changed event with the filter set to only receive events where the new_status field is equal to shipment-received).
The above reflects principles of Service-Oriented architecture - and there's plenty of good material out there elaborating the points above.

Monitoring the status of a google pub/sub submitted job

I am new to Google Compute/Google App Engine platform. I am currently migrating a python flask application using celery for async tasks to Google Compute/Google App Engine platform. However in the docs it's written I should use Google Pub/Sub instead of celery. In my application whenever I run an async task I have a page to monitor the status of the job using the same principle as http://blog.miguelgrinberg.com/post/using-celery-with-flask. I have checked the documents for google pub/sub, but I am at loss how to implement the same using google pub/sub. Can anybody help or point me to the right direction to implement the same in google pub/sub.
You might be able to use psq for this, which is designed to look like celery. From a general Cloud Pub/Sub perspective, you would follow these steps:
Create a topic for your status update messages.
In the async task whose status you want to monitor, periodically publish a message with the status. This message will be of some format of your choosing that would indicate percentage completion or specific message to display.
Create a subscription for your monitoring page that will receive messages on the topic.
In your monitoring page (or a background process that will supply the data to your monitoring page), pull messages for the subscription.
Process the messages and update the state of your jobs for your monitoring page.
Ack the messages you pulled and processed.
A couple of things to keep in mind in this workflow:
Cloud Pub/Sub guarantees at-least-once delivery. That means you could potentially receive the same message more than once.
Cloud Pub/Sub does not provide any guarantees on ordering. Therefore, if you are periodically publishing status updates, your subscriber could potentially receive these out of order. For your case, you'll probably want your message to include some sort of timestamp or strictly-increasing identifier in your message to sequence your status updates per task. If you keep track of the most recent status update received, then you can disregard older messages and ack them immediately.