MSK vs SQS + SNS - amazon-web-services

I am deciding if I should use MSK (managed kafka from AWS) or a combination of SQS + SNS to achieve a pub sub model?
Background
Currently, we have a micro service architecture but we don't use any messaging service and only use REST apis (dont ask why - related to some 3rd party vendors who designed the architecture). Now, I want to revamp it and start using messaging for communication between micro-services.
Initially, the plan is to start publishing entity events for any other micro service to consume - these events will also be stored in data lake in S3 which will also serve as a base for starting data team.
Later, I want to move certain features from REST to async communication.
Anyway, the main question I have is - should I decide to go with MSK or should I use SQS + SNS for the same? ( I already understand the basic concepts but wanted to understand from fellow community if there are some other pros and cons)?
Thanks in advance

MSK VS SQS+SNS is not really 1:1 comparison. The choice depends on various use cases. Please find out some of specific difference between two
Scalability ->
MSK has better scalability option because of inherent design of partitions that allow parallelism and ordering of message.
SNS has limitation of 300 publish/Second, to achieve same performance as MSK, there need to have higher number of SNS topic for same purpose.
Example : Topic: Order Service
in MSK -> one topic+ 10 Partitions
SNS -> 10 topics
if client/message producer use 10 SNS topic for same purpose, then client needs to have information of all 10 SNS topic and distribution of message.
In MSK, it's pretty straightforward, key needs to send in message and kafka will allocate the partition based on Key value.
Administration/Operation ->
SNS+SQS setup is much simpler compare to MSK.
Operational challenge is much more with MSK( even this is managed service). MSK needs more in depth skills to use optimally.
SNS +SQS VS SQS -> I believe you have multiple subscription(fanout) for same message thats why you have refer SNS +SQS.
If you have One Subscription for one message, then only SQS is also sufficient.
Replay of message -> MSK can be use for replaying the already processed message.
It will be tricky for SQS, though can be achieve by having duplicate queue so that can be use for replay.

Related

AWS - Server to Server pub-sub

We do have a system that is using Redis pub/sub features to communicate between different parts of the system. To keep it simple we used the pub/sub channel to implement different things. On both ends (producer and consumer), we do have Servers containing code that I see no way to convert into Lambda Functions.
We are migrating to AWS and among other changes, we are trying to replace the use of Redis with a managed pub/sub solution. The required solution is fairly simple: a managed broker that allows to publish a message from one node and to subscribe for its reception from 0 or more other nodes.
It seems impossible to achieve this with any of the available solutions:
Kinesis - It is a streaming solution for data ingestion (similar to Apache Pulsar)
SNS - From the documentation, it looks like exactly what we need until we realize that there is no solution to connect a server (not a Lambda) unless with a custom HTTP endpoint.
EventBridge - Same issue as with SNS
SQS - It is a queue, not a pub/sub.
Amazon MQ / Rabbit MQ - It is a queue, not a pub/sub. But also is not a SaaS solution but rather an installation to an owned node.
I see no reason to remove a feature such as subscribing from a server, this is why I was sure it will be present in one or more of the available solutions. But we went through the documentation and attempted to consume fro SNS and EventBridge without success. Are we missing something? How to achieve what we need?
Example
Assume we have an API server layer, deployed on ECS with a load balancer in front. The API has 2 endpoints, a PUT to update a document, and an SSE to listen for updates on documents.
Assuming a simple round-robin load balancer, an update for document1 may occur on node1 where a client may have an ongoing SSE request for the same document on node2. This can be done with a Redis backbone; node1 publishes on document1 topic and node2 is subscribed to the same topic. This solution is fast and efficient (in this case at-most-once delivery is perfectly acceptable).
Being this an example we will not consider WebSocket pub/sub API or other ready-made solutions for this specific use case.
Lambda
Subscriber side can not be a Lambda. Being two distinct contexts involved (the SSE HTTP Request one and the SNS event one) this will cause two distinct lambdas to fire and no way to 'stitch' them together.
SNS + SQS
We hesitate with SQS in conjunction with SNS being a solution that will add a lot of unneeded complexity:
Number of nodes is not known in advance and they scale, requiring an automated system to increase/reduce the number of SQS queues.
Persistence is not required
Additional latency is introduced
Additional infrastructure cost
HTTP Endpoint
This is the closest thing to a programmatic subscription but suffers from similar issues to the SNS-SQS solution:
Number of nodes is unknown requiring endpoint subscriptions to be automatically added.
Eiter we expose one endpoint for each node or have a particular configuration on the Load Balancer to route the message to the appropriate node.
Additional API endpoints must be exposed, maintained, and secured.

How to subscribe to AWS Event Bus events in a client application

How to subscribe to AWS Event Bus events from client side applications ex: NodeJS app, Angular client or a Mobile client app ?
In December 2020, an email from AWS Marketing has presented the advantages to use the Event-Driven architecture . Following the documentation and the tutorials, soon I stumble into the wall of not finding a way to subscribe to this events from a client side application.
The email states:
4 Reasons to Care About Event-Driven Architectures
Are you looking to scale and build robust applications without delays and dependencies? We break down the basics of event-driven architectures, how they work, and show you ways to get started. Learn how event-driven architectures can help you:
Scale and fail independently - No more dependencies
Develop with agility - No custom polling code
Audit with ease - Use your event router to define policies
Cut costs - Stop paying for continuous polling
The disappointing part is that there is no example of libraries to be integrated in the client side code to subscribe to those event. Googling does not return any significant result and the only current library for node: #aws-sdk/client-eventbridge-node only expose a send and destroy methods.
There is no way to directly subscribe to an Amazon EventBridge bus, as it doesn't provide publish/subscribe functionality. In order to process events in EventBridge you create event rules that filter and send matching events to targets. You can find all targets available to EventBridge rules on this list: Amazon EventBridge Targets.
One of these targets could be an Amazon SNS topic, which provides pub/sub functionality, i.e. your client application can subscribe to the topic to automatically receive the respective events.
This may sound complicated at first, but the implementation is strictly following the principle of separating concerns. It provides simple building blocks--like Lego pieces--that you can put together in order to create truly loosely coupled architectures.
This diagram shows the functionality in scope of Amazon Event Bridge and how it communicates with other services and applications.
Services that allow you to subscribe as you want it (directly delivering subscribed messages to your code over a tcp connection such as a websocket) are:
AppSync - websocket
IOT Core - websocket, mqtt
SQS - long poll
Kafka
(From the top of my head)
So a straightforward serverless solution for you could be:
Event bridge —> SQS —> your code
I often use AppSync for this purpose. But eventbridge is cool too.
if you want to avoid polling and you cant/want use AWS Lambda then you can reverse the problem and call an api on your application from EventBridge with a rule.
You can create API Targets in EventBridge:
API destination

AWS SNS vs AWS Step Functions

What's the better option to coordinate tasks between microservices?
For example, if I have a microservice that handles customer information and need to notifies other microservices, is it better to create a workflow (AWS Steps) between microservices or use a SNS?
I think AWS Steps will couple my lambda functions, and SNS not.
AWS Step Functions is a step machine that executes AWS Lambda functions. If your task involves "do this, then this" activities, then Step Functions could be a good option. It includes logic to determine the next step and automatically handles retries. It's the modern version of Amazon Simple Workflow (SWF).
Amazon Simple Notification Service (SNS) can also trigger Lambda functions, but it does not handle the logic nor the retries. It's a good fit for decoupled services, especially for fan-out where multiple subscribers receive the same message from a topic -- for example, for triggering multiple Lambda functions or sending multiple notifications. It's basically a public/subscribe service, of which Lambda is one of the subscriber types.
The choice will depend upon your particular use-case. If you don't want to redesign things to use Step Functions, then send notifications via SNS. If you sometimes send notifications (eg emails) rather than just trigger Lambda functions, use SNS.
Currently, Step Functions is not available in every region, while SNS is everywhere so that might also influence your choice.
It depends on what type of coordination you want.
Synchronous or Asynchronous.
If it is synchronous and if you really want some co-ordination between them, then Amazon Simple Notification Service (SNS) would not help and AWS Step Functions would be the way to go.
But if the requirement is asynchronous, and you just want to notify/invoke the microservices then SNS would be a better fit.
As I can read from your question "need to notify other microservices" I assume it is just about notifying them (as against to co-ordinating them) and each would know what to do further without relying on other microservices. And if that is true then SNS is a good fit.

using AWS SNS and Lambda - what's the right use case for an activity feed

I want to use an AWS lambda function to fan out and insert activity stream info to a firebase endpoint for every user.
Should I be using Kinesis, SQS or SNS to trigger the lambda function for this use case? The updates to the activity stream can be triggered from the server and clients should receive the update near real time (within 60 seconds or so).
I think I have a pretty good idea on what SQS is, and have used Kinesis in the past but not quite sure about SNS.
If we created an SNS topic for each user and then each follower subscribes to these topics with an AWS lambda function - would that work?
Does it make sense to programmatically create topics and subscriptions for every user and follow relationship respectively?
As usual, answer to such a question is mostly, 'it depends on your use-case'.
Kinesis vs SQS:
If your clients care about relative (timestamp-based, for e.g.) ordering between events, you'll almost certainly have to go with Kinesis. SQS is a best-effort FIFO queue, meaning events can arrive out of order and it would up to your client to manage relative ordering.
As far as latencies are concerned, I have seen that data ingested into Kinesis can become visible to its consumer in as less as 300 ms.
When can SNS be interesting to you?
(Even with SNS, you'd have to use SQS). If you use SNS, it will be easy to add a new application that can process your events. For example, if in future you decide to ingest all events into, say, an Elasticsearch to provide real-time analytics, all you'd have to do is add another SQS queue to your existing topic(s) and write a consumer.

Amazon SQS queue limits

My project requires me to communicate with many devices outside the cloud. If successful, this means millions of devices. These devices will not be running Android or iOS, and will be running behind routers & firewalls (I cannot assume they have an external IP).
I am looking to use SQS to send messages to my users outside the cloud. To allow the servers to message individual users, I am designing the system to have one queue per client. This can potentially mean millions (billions?) of queues. While it states that SQS can support unlimited queues, I would like to make sure that I am not abusing the system. If successful, the probability of millions of users is very high.
I am aware that SQS can be expensive, but I am using it at this stage
for ease of administration.
As far as I can tell SNS requires either an IOS/Android client, or an
HTTP server running on the consumer. This is why I ruled out SNS, and I
am using SQS.
I am going to build a distributed cloud front-end over SQS to handle
client connections. This front-end will just be a wrapper, that will
authenticate clients, and relay them to the SQS queues.
Am I abusing the SQS "unlimited queues" policy (will SQS performance drop)? Is there a simpler design for per device messaging?
Let me break the answer by the parts of your question:
About your questions:
Am I abusing the SQS "unlimited queues" policy?
AWS services are designed to prevent abuse and you will pay exactly for what you use, so if you believe this is the right approach, go for it. To remove the uncertainty, i'd advise for a preliminary "proof of concept" implementation.
Is there a simpler design for per device messaging?
Probably yes, re-consider SNS and other messaging systems.
About your statements:
I am aware that SQS can be expensive, but I am using it at this stage
for ease of administration.
"Expensive" is a very context-depend classification, considering that a SQS message can cost $0.00000005.
As far as I can tell SNS requires either an IOS/Android client, or an
HTTP server running on the consumer. This is why I ruled out SNS, and
I am using SQS.
SNS is a push based messaging system (SQS is pull based) that can handle 5 types of subscriptions: smtp, sms, http, mobile push and SQS, so they are not mutually exclusive.
I am going to build a distributed cloud front-end over SQS to handle
client connections. This front-end will just be a wrapper, that will
authenticate clients, and relay them to the SQS queues.
Managing millions of queues can be a overwhelming task for your "distributed cloud front-end over SQS". Unless your project is exactly about queue management, this is probably undifferentiated heavy lifting.
This is about all i can say without knowing your case, but consider that you can use SNS/SQS together with each other and with other messaging software, such as Apache Camel and others, that may help you build your solution or proof of concept.
I think SQS (or SNS if you can eventually use them) are still your best bet, esp if you need "quick response" or "near real time"; however, for the sake of having "alternatives" just so you can compare...
You can consider a giant dynamoDB, with each device/client having it's own "device-id" and perhaps "message-id" as key. This way, your devices can query it's own keys for messages. DynamoDB is meant to handle billions of rows, so this won't stress it much. The querying part, you should be careful though, as you could use up provisioned queries, although at aggregate level, your devices may not all respond/query at the same time, so you may still be ok.
You can also consider having a giant S3 bucket, each folder key'ed to the device id and further keyed into message-id folders. This is a poor man's SQS but it's guaranteed to scale, both in message quantity and number of accesses to it.
In both #1 and #2, if your devices are registered with Cognito for credentials, there's a straightforward way to do policies, so the devices can only access their "own" stuff. However, both alternatives #1 and #2 is likely slower than SQS, esp if you use SQS long-poll -- in long poll, you get responses, as soon as SQS detects a message have been dropped off... These alternatives will require you to wait for next cycle-poll.