Is an SQS needed with a Lambda in this use case? - amazon-web-services

I'm trying to build a flow that allows a user to enter data and it's being stored in RDS. My question is do I need to go from USER -> SQS -> Lambda -> RDS to or is it better to go straight from USER -> Lambda -> RDS which skips the queue entirely. Are there going to be scalability issues with the latter?
I do like that the SQS can retry a large number of times to guarantee the data, but is there a similar way to retry with a lambda alone? It's important that all of the data is stored and done so in a timely manner. I'm struggling to see the tradeoffs of the two scenarios.
If anyone has any input on the situation, that would be amazing.

Are there going to be scalability issues with the latter?
It depends on multiple metrics including traffic, spikes, size of the database, rpm etc.
Putting SQS before lambda provides you to manage number of database queries in t time according to your needs. It is a "queue" and you are consuming that queue. In some business cases it may not be useful(banking transactions etc) but in some cases(analytic calculations) it may be helpful. Instead of making a single insert whenever lambda is invoked, you can set batch size and insert in batch(10 records at once) which reduces the number of queries.
Also you can define dead letter queue to push your problematic data(couldn't make it to database). It will be another queue that you to check later to identify problematic inputs. The document can be found here

Related

Is SQS better than DynamoDB for peak loads?

A service runs on ECS and writes the requested URL to a DynamoDB. Dynamic scaling was activated to keep the costs for DynamoDB from becoming too high. DynamoDB scales slower than requests are coming in at any given time, so some calls are not logged. My question now is whether writing to an SQS would be the better way here, because the documentation says:
Standard queues support a nearly unlimited number of API calls per second, per API action (SendMessage, ReceiveMessage, or DeleteMessage).
Of course, the messages would then have to be written back to DynamoDB, but another service can then do that.
Is the throughput of messages per second to SQS really unlimited, so it's definitely cheaper to send messages to SQS instead of increasing DynamoDB's writes per second?
I don't know if this qualifies for a good answer. But remembering a discussion with my architect at the time, we concluded that to have a queue for precisely this problem seems good practice, regardless of load. It keeps requests even if services go down, so there is an added benefit.
SQS and Dynamo fit two very different use cases. Its not so much which is better, its which is right for what you need.
Dynamodb is a NoSQL Document based Database. This is best for when you have known access patterns to data that needs to persist over time, that you need to access quickly, but probably are not making many changes too (or at least the changes do not have to be absolutely immediately, sub 5 ms accessible). Each document in a dynamodb is similar (but also very different) to a row in a standard SQL table, in that it will have attributes (columns) keys (Partition and Sort Key) and be retrievable through a query (though dynamic on the fly queries are NOT good for Dynamo)
SQS is a Queue system. It has no persistence. Payloads of JSON objects are dropped into the Queue and then processed by some end point - either a Lambda, or put into a dynamo, or something else entirely depending on your products use case. It is perfect for when you often receive bursts of data but your system needs some time to handle each individual payload - such as it is waiting on other systems to finish before it can handle the next one - so instead of scaling horizontally (by just handling all the payloads in parallel) you have to scale vertically (be able to handle more payloads at once through a single or only a few threads). You cannot access the data coming in while it is waiting in the queue, no queries on said data, only wait until that data pops/pushes off the queue and into processing by whatever system you have set up to receive it.
The answer to your question is entirely dependent on your use case and your system - something we here at SO will never really understand or know simply because we will always be hearing about it through you and never really experiencing it. As such, to answer it, you need to understand the capabilities of both Dynamo and SQS, the pros and cons for each, and then determine which is best for your product.

Triggering an AWS Lambda Function based on multiple (1k-10k) schedules

We have an AWS Lambda function which queries some data for a client from our DB and sends a report to the client. Some clients want daily reports, some might need weekly or monthly reports. The number of clients can go up ~1000 and each client might have ~10 such reports.
So we are looking for a way to trigger the Lambda function with different parameters based on schedules set by each client.
For Example:
Client A wants daily report of their data to be sent to abc#clienta.com and Client B wants a weekly report of their data to be sent to xyz#clientb.com. So the Lambda function will be invoked twice on Sunday 12 AM (for both clients) and once on Monday-Saturday 12 AM (for Client A).
We found the following solutions on AWS, but both have some limitations.
Approach 1: Use CloudWatch Events
We can create a CloudWatch Events Rule for each client and each report that could trigger our Lambda function on each schedule.
Pros:
Simple setup, easy to implement.
Cons:
There is a limitation of 100 Event Rules per AWS Account. It's mentioned that we can contact AWS to get it increased, but we are not sure if it can be increased to the number we are looking for (Currently it is ~10k, but we would prefer a solution in which there is no such limit). Also, a limit of 100 per account gives an indication that this is not a suitable solution for such a use case.
Approach 2: Using Step Functions
For each client and each report, we can create one AWS State Machine. We can use the Iterator pattern in Step Functions to wait for a day/week/month and then re-invoke the Lambda Function.
Pros:
No limitations on number of State Machines, so this enables us to scale easily.
Cons:
Step Functions have a limitation that they can run for a year, at maximum. This will be a problem in our case because the users will need to get the reports for a much longer period. There is a way to overcome this in Step Functions. Just before it's about to reach the 1-year limit, we can cancel the execution and start a fresh execution. So overall, this solution looks complex.
Can someone suggest a better solution for this on AWS?
Do you really need a CloudWatch for each client? Why not do something like the following architecture.
Have cloudwatch kick off a lambda that checks schedules for all clients each day (or whatever the most frequent report schedule you allow). You don't want this to take a long time so you just have this check a database (i.e. DynamoDB) of schedules and drop metadata about any reports that need to be generated onto an SQS queue (i.e. type of report, client information, destination email). Worst case, this execute and finds nothing to schedule but this should only takes seconds so the cost is very low to just run this everyday.
Then you have a lambda that actually does the report generator and email that consumes the queue. This report generator lambda will scale and spin up as many instances it needs to handle the messages on the queue. You can set the concurrency limit for the report generator lambda to ensure it doesn't spin up too many at a time if that is a concern once you are having 1000s of clients.
The definition and deployment of all these components can easily be automated via an AWS SAM.
Hope this alternate approach gives you a few more ideas.
You can combine both approach, to get the best result.
step 1: Use stepfunction to run your lambdas.
step 2: Trigger your stepfunction from cloudwatch, based on stepfunction event(SUCCESS,FAILED ETC).
In this way when step 1 fails or completes 1 year run. Cloudwatch event can trigger it back on, based on the json input you pass.

AWS Lambda - Store state of a queue

I'm currently tasked with building a serverless architecture for communication between government agencies and citizens, and a main component is some form of queue that contains some form of object/pointer to each citizens request, sorted by priority. The government workers can then process an element when available. As Lambda is stateless, I need to save the queue outside in some manner.
For saving state I've gathered that you can use DynamoDB or S3 Buckets and use event triggers to invoke related Lambda methods. Some also suggest using Parameter Store to save some state variables. Storing things globally has also come up, though as you can't guarantee that the Lambda doesn't terminate, it doesn't seem like a good idea.
Finally, I've also read a bit about SQS, though I have no idea if it is at all applicable to this case.
What is the best-practice / suggested approach when working with Lambda in this way? I'm leaning towards S3 Buckets, due to event triggering, and not using DynamoDB as our DB.
Storing things globally has also come up, though as you can't guarantee that the Lambda doesn't terminate, it doesn't seem like a good idea.
Correct -- this is not viable at all. Note that what you are actually referring to when you say "the Lambda" is the process inside the container... and any time your Lambda function is handling more than one invocation concurrently, you are guaranteed that they will not be running in the same container -- so "global" variables are only useful for optimization, not state. Any two concurrent invocations of the same function have two entirely different global environments.
Forgetting all about Lambda for a moment -- I am not saying don't use Lambda; I'm saying that whether or not you use Lambda isn't relevant to the rest of what is written, below -- I would suggest that parallel/concurrent actions in general are perhaps one of the most important factors that many developers tend to overlook when trying to design something like you are describing.
How you will assign work from this work "queue" is extremely important to consider. You can't just "find the next item" and display it to a worker.
You must have a way to do all of these things:
finding the next item that appears to be available
verify that it is indeed available
assign it to a specific worker
mark it as unavailable for assignment
Not only that, but you have to be able to do all of these things atomically -- as a single logical action -- and without collisions.
A naïve implementation runs the risk of assigning the same work item to two or more people, with the first assignment being blindly and silently overwritten by subsequent assignments that happen at almost the same time.
DynamoDB allows conditional updates -- update a record if and only if a certain condition is true. This is a critical piece of functionality that your solution needs to accommodate -- for example, assign work item x to user y if and only if item x is currently unassigned. A conditional update will fail, and changes nothing, if the condition is not true at the instant the update happens and therein lies the power of the feature.
S3 does not support conditional updates, because unlike DynamoDB, S3 operates only on an eventual-consistency model in most cases. After an object in S3 is updated or deleted, there is no guarantee that the next request to S3 will return the most recent version or that S3 will not return an item that has recently been deleted. This is not a defect in S3 -- it's an optimization -- but it makes S3 unsuited to the "work queue" aspect.
Skip this consideration and you will have a system that appears to work, and works correctly much of the time... but at other times, it "mysteriously" behaves wrongly.
Of course, if your work items have accompanying documents (scanned images, PDF, etc.), it's quite correct to store them in S3... but S3 is the wrong tool for storing "state." SSM Parameter Store is the wrong tool, for the same reason -- there is no way for two actions to work cooperatively when they both need to modify the "state" at the same time.
"Event triggers" are useful, of course, but from your description, the most notable "event" is not from the data, or the creation of the work item, but rather it is when the worker says "I'm ready for my next work item." It is at that point -- triggered by the web site/application code -- when the steps above are executed to select an item and assign it to a worker. (In practice, this could be browser → API Gateway → Lambda). From your description, there may be no need for the creation of a new work item to trigger an "event," or if there is, it is not the most significant among the events.
You will need a proper database for this. DynamoDB is a candidate, as is RDS.
The queues provided by SQS are designed to decouple two parts of your application -- when two processes run at different speeds, SQS is used as a buffer, allowing X to safely store the work needing to be done and then continue with something else until Y is able to do the work. SQS queues are opaque -- you can't introspect what's in the queue, you just take the next message and are responsible for handling it. On its face, that seems to partially describe what you need, but it is not a clean match for this use case. Queues are limited in how long messages can be retained, and once a message is successfully processed, it is completely gone.
Note also that SQS is only a match to your use case with the FIFO queue feature enabled, which guarantees perfect in-order delivery and exactly-once delivery -- standard SQS queues, for performance optimization reasons, do not guarantee perfect in-order delivery and may under certain conditions deliver the same message more than once, to the same consumer or a different consumer. But the SQS FIFO queue feature does not coexist with event triggers, which require standard queues.
So SQS may have a role, but you need an authoritative database to store the work and the results of the business process.
If you need to store the message, then SQS is not the best tool here, because your Lambda function would then need to process the message and finally store it somewhere, making SQS nothing but a broker.
The S3 approach gives what you need out of the box, considering you can store the files (messages) in an S3 bucket and then have one Lambda consume its event. Your Lambda would then process this event and the file would remain safe and sound on S3.
If you eventually need multiple consumers for this message, then you can send the S3 event to SNS instead and finally you could subscribe N Lambda Functions to a given SNS topic.
You appear to be worrying too much about the infrastructure at this stage and not enough on the application design. The fact that it will be serverless does not change the basic functionality of the application — it will still present a UI to users, they will still choose options that must trigger some business logic and information will still be stored in a database.
The queue you describe is merely a datastore of messages that are in a particular state. The application will have some form of business logic for determining the next message to handle, which could be based on creation timestamp, priority, location, category, user (eg VIP users who get faster response), specialization of the staff member asking for the next message, etc. This is not a "queue" but rather a calculation to be performed against all 'unresolved' messages to determine the next message to assign.
If you wish to go serverless, then the back-end will certainly be using Lambda and a database (eg DynamoDB or Amazon RDS). The application should store everything in the database so that data is available for the application's business logic. There is no need to use SQS since there really isn't a "queue", and Parameter Store is merely a way of sharing parameters amongst application components — it is not meant for core data storage.
Determine the application functionality first, then determine the appropriate architecture to make it happen.

AWS Event-Sourcing implementation

I'm quite a newbe in microservices and Event-Sourcing and I was trying to figure out a way to deploy a whole system on AWS.
As far as I know there are two ways to implement an Event-Driven architecture:
Using AWS Kinesis Data Stream
Using AWS SNS + SQS
So my base strategy is that every command is converted to an event which is stored in DynamoDB and exploit DynamoDB Streams to notify other microservices about a new event. But how? Which of the previous two solutions should I use?
The first one has the advanteges of:
Message ordering
At least one delivery
But the disadvantages are quite problematic:
No built-in autoscaling (you can achieve it using triggers)
No message visibility functionality (apparently, asking to confirm that)
No topic subscription
Very strict read transactions: you can improve it using multiple shards from what I read here you must have a not well defined number of lamdas with different invocation priorities and a not well defined strategy to avoid duplicate processing across multiple instances of the same microservice.
The second one has the advanteges of:
Is completely managed
Very high TPS
Topic subscriptions
Message visibility functionality
Drawbacks:
SQS messages are best-effort ordering, still no idea of what they means.
It says "A standard queue makes a best effort to preserve the order of messages, but more than one copy of a message might be delivered out of order".
Does it means that giving n copies of a message the first copy is delivered in order while the others are delivered unordered compared to the other messages' copies? Or "more that one" could be "all"?
A very big thanks for every kind of advice!
I'm quite a newbe in microservices and Event-Sourcing
Review Greg Young's talk Polygot Data for more insight into what follows.
Sharing events across service boundaries has two basic approaches - a push model and a pull model. For subscribers that care about the ordering of events, a pull model is "simpler" to maintain.
The basic idea being that each subscriber tracks its own high water mark for how many events in a stream it has processed, and queries an ordered representation of the event list to get updates.
In AWS, you would normally get this representation by querying the authoritative service for the updated event list (the implementation of which could include paging). The service might provide the list of events by querying dynamodb directly, or by getting the most recent key from DynamoDB, and then looking up cached representations of the events in S3.
In this approach, the "events" that are being pushed out of the system are really just notifications, allowing the subscribers to reduce the latency between the write into Dynamo and their own read.
I would normally reach for SNS (fan-out) for broadcasting notifications. Consumers that need bookkeeping support for which notifications they have handled would use SQS. But the primary channel for communicating the ordered events is pull.
I myself haven't looked hard at Kinesis - there's some general discussion in earlier questions -- but I think Kevin Sookocheff is onto something when he writes
...if you dig a little deeper you will find that Kinesis is well suited for a very particular use case, and if your application doesn’t fit this use case, Kinesis may be a lot more trouble than it’s worth.
Kinesis’ primary use case is collecting, storing and processing real-time continuous data streams. Data streams are data that are generated continuously by thousands of data sources, which typically send in the data records simultaneously, and in small sizes (order of Kilobytes).
Another thing: the fact that I'm accessing data from another
microservice stream is an anti-pattern, isn't it?
Well, part of the point of dividing a system into microservices is to reduce the coupling between the capabilities of the system. Accessing data across the microservice boundaries increases the coupling. So there's some tension there.
But basically if I'm using a pull model I need to read
data from other microservices' stream. Is it avoidable?
If you query the service you need for the information, rather than digging it out of the stream yourself, you reduce the coupling -- much like asking a service for data rather than reaching into an RDBMS and querying the tables yourself.
If you can avoid sharing the information between services at all, then you get even less coupling.
(Naive example: order fulfillment needs to know when an order has been paid for; so it needs a correlation id when the payment is made, but it doesn't need any of the other billing details.)

AWS Lambda - how to identify duplicate messages

Since several of the triggers for AWS Lambda can only guarantee message delivery "at least once" (SQS and IoT with QoS=1), I wonder what's the best way to identify a duplicate message and ignore it.
I can see that I currently get several duplicate messages, triggering my lambdas twice, causing noise and invalid data as a consequence.
In my client, I solve it by just storing a list of message IDs that I've processed, but in the Lambdas, I have nowhere to store a state.
Of course I could maintain a DB table of processed message IDs but it seems like overkill to me (and probably adds extra billed runtime to the lambdas). A simple key/value store service in memory would be enough.
What other solutions are you guys using?
I know you don't want to use a DB but dynamodb can work well for this kind of thing. If you have something you can use as a good partition key then it will still be quite performant. It will still add a very small amount of time to your lambda run time and, of course, you will be charged for your dynamodb capacity & data. I use this successfully to discard duplicate messages.
The other thing that might be worth looking into would be elasticache which has memcached and redis versions. This would be faster - if performance is a particular focus - but is not persistent like DynamoDB.