How to implement resiliency (retry) in a nested service call chain - amazon-web-services

We have a webpage that queries an item from an API gateway which in turn calls a service that calls another service and so on.
Webpage --> API Gateway --> service#1 --> service#2 --> data store (RDMS, S3, Azure blob)
We want to make the operation resilient so we added a retry mechanism at every layer.
Webpage --retry--> API Gateway --retry--> service#1 --retry--> service#2 --retry--> data store.
This however could case a cascading failure because if the data store doesn't response on time, it will cause every layer to timeout and retry. In other words, if each layer has the same connection timeout and is configured to retry 3 times, then there will be a total of 81 retries to the data store (which is called a retry storm).
One way to fix this is to increase the timeout at each layer in order to give the layer below time to retry.
Webpage --5m timeout--> API Gateway --2m timeout--> service#1
This however is unacceptable because the timeout at the webpage will be too long.
How should I address this problem?
Should there only be one layer that retries? Which layer? And how can the layer know if the error is transient?

A couple possible solutions (and you can/should use both) would be to retry on different conditions and implement rate limiters/circuit breakers.
Retry On is a technique where you don't retry on every condition, but only specific conditions. This could be a specific error code or a specific header value. E.g. in your current situation, DO NOT retry on timeouts; only retry on server failures. In addition, you could have each layer retry on different conditions
Rate limiting would be to stick either a local or global rate limiter service inline to the connections. This would just help to short-circuit the thundering herd in the case that it starts up. E.g. rate limit the data layer to X req/s (insert real values here) and the gateway to Y req/s and then even if a service attempts lots of retries it won't pass too far down the chain. Similarly to this is circuit breaking, where each layer only permits X active connections to any downstream, so just another way to slow those retry storms.

Related

Rate Exceeded on AWS Lambda Using API Gateway and serverless framework

When I try to invoke a method that has a HTTP event it results in 500 Internal server error.
On CloudWatch logs it shows Recoverable error occurred (Rate Exceeded.)
When I try invoke a function without lambda it executes with response.
Here is my serverless config:
You have set your Lambda's reservedConcurrency to 0. This will prevent your Lambda from ever being invoked. Setting it to 0 is usually useful when your functions are getting invoked but you're not sure why and you want to stop it right away.
If you want to have it invoked, change reservedConcurrency to a positive integer (by default, it can be a positive integer <= 1000, but you can increase this limit by contacting AWS) or simply remove the reservedConcurrency attribute from your .yml file as it will use the default values.
Why would one ever use reservedConcurrency anyways? Well, let's say your Lambda functions are triggered by requests from API Gateway. Let's say you get 400 (peak hours) requests/second and, upon every request, two other Lambda functions are triggered, one to generate a thumbnail for a given image and one to insert some metadata in DynamoDB. You'd have, in theory, 1200 Lambda functions running at the same time (given all of your Lambda functions finish their execution in less than a second). This would lead to throttling as the default concurrent execution for Lambda functions is 1000. But is the thumbnail generation as important as the requests coming from API Gateway? Very likely not as it's naturally an eventually consistent task, so you could set reservedConcurrency on the thumbnail Lambda to only 200, so you wouldn't use up your concurrency, meaning other functions would be able to spin up to do something more useful at a given point in time (in our example, receiving HTTP requests is more important than generating thumbnails). The other 800 left concurrency could then be split between the function triggered from API Gateway and the one that inserts data into DynamoDB, thus preventing throttling for the important stuff and keeping the not-so-important-stuff eventually consistent.

Throttling of Flow not working when created from Route

Consider routes containing all the HTTP services
val routes:Route = ...
I wish to throttle number of requests so I used Route.handleFlow(routes) to create flow and called throttle method with finite duration.
Finally, I created HTTP binding using
Http().bindAndHandle(flowObjectAfterThrottling, hostname, port)
When HTTP requests are fired from a loop throttling is not obeyed by akka.
One possibility is that the http requests being "fired from a loop" may be using separate connections. Each incoming connection is throttling at the appropriate rate but the aggregate throughput is higher than expected.
Use Configurations Instead
You don't need to write software to set limiting rates for your Route.
If you are only concerned with consumption of a resource, such as disk or RAM, then you can remove the rate logic and use akka configuration settings instead:
# The maximum number of concurrently accepted connections when using the
# `Http().bindAndHandle` methods.
max-connections = 1024
# The maximum number of requests that are accepted (and dispatched to
# the application) on one single connection before the first request
# has to be completed.
pipelining-limit = 16
This doesn't provide the ability to set a maximum frequency, but it does at least allow for the specification of a maximum concurrent usage which is usually sufficient for resource protection.

Azure Service Bus Topic-with paired or retry

We are using Azure Service Bus Topic in workflow manager (approval process). In any way, we don’t want to lose/duplicate messages when we push messages to service bus topic. Now there are two options.
a. Use Retry the only
b. Use Paired service bus only without retry.
As we cannot use both together, let assume during message push, primary service bus is not available then message pus to paired service bus and when primary service bus available then automatically message push to the primary. But if we use retry, retry will try to push message to primary and as primary service bus is not available messages will go to paired service bus also. so there are chances to process duplicate messages.
Which is the best option “a” or “b”, to push message to service bus for the given problem statement?
Both options have their pros and cons.
With Paired Namespaces you get the ability to continue sending messages while your primary namespace is down. But don't get fooled. You only store those messages while the primary namespace is down. They are not retried by the reveiver. Other drawbacks include
No good testability.
Increased cost (you send to the secondary, retrieve back from it to send to the primary).
Failover to the secondary is not very intuitive. You have to manually retry the message after a failure. It is not automatically switches to the secondary namespace.
Have a look at this post for more details.
With retries approach you gain the simplicity. And something you'd need to do anyways. With Azure Service Bus operations can fail with intermittent exceptions and you should retry anyways. The drawback of having only retries - doesn't protect from outages. That's why you could combine it with a secondary namespace using custom implementation, but that's a whole different can of warms. Libraries like NServiceBus provides a custom implementation you can get the idea from.

AWS API Gateway Cache - Multiple service hits with burst of calls

I am working on a mobile app that will broadcast a push message to hundreds of thousands of devices at a time. When each user opens their app from the push message, the app will hit our API for data. The API resource will be identical for each user of this push.
Now let's assume that all 500,000 users open their app at the same time. API Gateway will get 500,000 identical calls.
Because all 500,000 nearly concurrent requests are asking for the same data, I want to cache it. But keep in mind that it takes about 2 seconds to compute the requested value.
What I want to happen
I want API Gateway to see that the data is not in the cache, let the first call through to my backend service while the other requests are held in queue, populate the cache from the first call, and then respond to the other 499,999 requests using the cached data.
What is (seems to be) happening
API Gateway, seeing that there is no cached value, is sending every one of the 500,000 requests to the backend service! So I will be recomputing the value with some complex db query way more times than resources will allow. This happens because the last call comes into API Gateway before the first call has populated the cache.
Is there any way I can get this behavior?
I know that based on my example that perhaps I could prime the cache by invoking the API call myself just before broadcasting the bulk push job, but the actual use-case is slightly more complicated than my simplified example. But rest assured, solving this simplified use-case will solve what I am trying to do.
If you anticipate that kind of burst concurrency, priming the cache yourself is certainly the best option. Have you also considered adding throttling to the stage/method to protect your backend from a large surge in traffic? Clients could be instructed to retry on throttles and they would eventually get a response.
I'll bring your feedback and proposed solution to the team and put it on our backlog.

When is it necessary to queue the requests from the client

I have heard that there is a limit for a server for the requests number it can process.
So if the requests from the client are large than the number people will queue the requests.
So I have two problems:
1 When
How to decide if it is necessary to queue the requests? How to measure the largest number?
2 How
If the queue is unavoidable, so where should be the queue done?
For a J2EE application using spring web mvc as the framework, I want to know if the queue should be put in the Controller or the Model or the DAO?
3 Is there a idea which can avoid the queue but keeping providing the service?
First you have to establish your limit at the server actually is. Its likely that its a limit on the frequency of messages, ie. maybe your limited to sending 10 requests a second. If thats the case then your would need to keep a count of how many messages you've sent out in a second, then before you send out a request check to see if you will breach this limit, if this is true then you must make the thread wait until the second is up. If not your free to send the request. This thread would be reading from a queue of outbound messages.
If the server limit is determined in an other way, i.e. dynamically based on its current load, which sounds like it might be in your case, there must be a continuous feed of request limits which you must process to determine the current limit. Once you have this limit you can process the requests in the same way as mentioned in the first paragraph.
As for where to put the queue and the associated logic, i'd put it in the controller.
I don't think there is a way to avoid the queue, you are forced to throttle your requests an therefore you must queue your outbound requests internally so that they are not lost, and will be processed at some point in the future.