Optimization and loadbalancing of microservice based backend - amazon-web-services

I have a client which has a pretty popular ticket selling service, to the point that the microservice based backend is struggling to keep up, I need to come up with a solution to optimize and loadbalance the system. The infrastructure works through a series of interconnected microservices.
When a user enter the sales channels (mobile or web app), the request is directed to an AWS API Gateway which is in charge of orchestrating the communication towards the microservice in charge of obtaining the requested resources.
These resources are provided from a third party API
This third party has physical servers in each venue in charge of synchronizing the information between the POS systems and the digital sales channels.
We have a REDIS instance in charge of caching these requests that we make to the third party API, we cache each endpoint with a TTL relative to the frequency of updating the information.
Here is some background info:
We get traffic mostly from 2 major countries
On a normal day, about 100 thousands users will use the service, with an 70%/30% traffic relation in between the two countries
On important days, each country has different opening hours (Country A starts sales at 10 am UTC, but country B starts at 5 pm UTC), on these days the traffic increases some n times
We have a main MiddleWare through which all requests made by clients are processed.
We have a REDIS cache database that stores GETs with different TTLs for each endpoint.
We have a MiddleWare that decides to make the request to the cache or to the third party's API, as the case may be.
And these are the complaints I have gotten that need to be deal with:
When a country receives a high amount of requests, the country with the least traffic gets negatively affected, the clients do not respond, or respond partially because the computation layer's limit was exceeded and so the users have a bad experience
Every time the above happens, the computation layer must be manually increased from the infrastructure.
Each request has different response times, stadiums respond in +/- 40 seconds and movie theaters in 3 seconds. These requests enter a queue and are answered in order of arrival.
The error handling is not clear. The errors are mixed up and you can't tell from which country the errors are coming from and how many errors there are
The responses from the third party API are not cached correctly in the cache layer since errors are stored for the time of the TTL
I was thinking of a couple of thinks that I could suggest:
Adding in instrumentation of the requests by using AWS X-Ray
Adding in a separate table for errors in the redis cache layer (old data has to be better than no data for the end user)
Adding in AWS elastic load balancing for the main middleware
But I'm not sure how realistic would be to implement these 3 things, I'm also not sure if they would even solve the problem, I personally don't really have experience with optimizing this type of backed. I would appreciate any suggestions, recommendations, links, documentation, etc. I'm really desperate for a solution to this problem

few thoughts:
When a country receives a high amount of requests, the country with the least traffic gets negatively affected, the clients do not respond, or respond partially because the computation layer's limit was exceeded and so the users have a bad experience
A common approach in aws is to regionalize stack - assuming you are using cdk/cloud formation creating regionalized stack should be a straightforward task.
But it is a question if this will solve the problem. Your system suffers from availability issues, regionalization will isolate this problem down to regions. So we should be able to do better (see below)
Every time the above happens, the computation layer must be manually increased from the infrastructure.
AWS has an option to automatically scale up and down based on traffic patterns. This is a neat feature, given you set limits to make sure you are not overcharged.
Each request has different response times, stadiums respond in +/- 40 seconds and movie theaters in 3 seconds. These requests enter a queue and are answered in order of arrival.
It seems that the large variance is because you have to contact the servers at venues. I recommend to decouple that activity. Basically calls to venues should be done async; there are several ways you could do that - queues and customer push/pull are the approaches (please, comment if more details are needed. but this is quite standard problem - lots of data in the internet)
The error handling is not clear. The errors are mixed up and you can't tell from which country the errors are coming from and how many errors there are
That's a code fix, when you do send data to cloudwatch (do you?). You could put country as a context to all request, via filter or something. And when error is logged that context is logged as well. You probably need venue id even more than country, as you can conclude country from venue id.
The responses from the third party API are not cached correctly in the cache layer since errors are stored for the time of the TTL
Don't store errors + add a circuit breaker pattern.

Related

How to handle long requests in Google Cloud Run?

I have hosted my node app in Cloud Run and all of my requests served within 300 - 600ms time. But one endpoint that gets data from a 3rd party service so that request takes 1.2s - 2.5s to complete the request.
My doubts regarding this are
Is 1.2s - 2.5s requests suitable for cloud run? Or is there any rule that the requests should be completed within xx ms?
Also see the screenshot, I got a message along with the request in logs "The request caused a new container instance to be started and may thus take longer and use more CPU than a typical request"
What caused a new container instance to be started?
Is there any alternative or work around to handle long requests?
Any advice / suggestions would be greatly appreciated.
Thanks in advance.
I don't think that will be an issue unless you're worried about the cost of the CPU/memory time, which honestly should only matter if you're getting 10k+ requests/day. So, probably doesn't matter and cloud run can handle that just fine (my own app does requests longer than that with no problem)
It's possible that your service was "scaled to zero" meaning that there were no containers left running to serve requests. In that case, it would be necessary to start up a new instance and wait for whatever initializing/startup costs are associated with that process. It's also possible that it was auto-scaled due to all other instances being at their request limits. Make sure that your setting for max concurrent requests per instance is set greater than one - Node/Express can handle multiple requests at once. Plus, you'll only get charged for the total time spend, not per request:
In situations where you get very long (30 seconds, minutes+) operations, it may be a good idea to switch to some different data transfer method. You could use polling, where the client makes a request every 5 seconds and checks if the response is ready. You could also switch to some kind of push-based system like WebSockets, but Cloud Run doesn't have support for that.
TL;DR longer requests (~10-30 seconds) should be fine unless you're worried about the cost of the increased compute time they may occur at scale.

Delayed SES Stats Updation

I am noticing AWS SES stats are not being updated in real-time. After sending email, it takes time for sent count to increase on SES Dashboard. Sometimes it takes few minutes and sometimes it takes long.
Has anyone also experienced this? Any thoughts?
On the assumption that the console is simply making a call to a standard API action (rather than using some kind a console-only backend service that is not documented or user-accessible -- such things are not unheard-of, but are pretty rare in AWS, so it's a reasonably safe assumption), it looks like this is not really designed to be real-time. The stats are reported in 15 minute windows.
From the SES API reference:
GetSendStatistics
Returns the user's sending statistics. The result is a list of data points, representing the last two weeks of sending activity.
Each data point in the list contains statistics for a 15-minute interval.
— http://docs.aws.amazon.com/ses/latest/APIReference/API_GetSendStatistics.html
AWS/SES dashboard stats are for pure hint performace but not to rely on them. In such case, if you want to have real time notifications of sent emails you will need to create SNS notifications. Keep in mind that Spam-Complaint notifications can take up to a couple of days as this is based on information provided by the ISP to Amazon. And complaints within the Gmail evil-system will NEVER get to you.

AWS API Gateway Cache - Multiple service hits with burst of calls

I am working on a mobile app that will broadcast a push message to hundreds of thousands of devices at a time. When each user opens their app from the push message, the app will hit our API for data. The API resource will be identical for each user of this push.
Now let's assume that all 500,000 users open their app at the same time. API Gateway will get 500,000 identical calls.
Because all 500,000 nearly concurrent requests are asking for the same data, I want to cache it. But keep in mind that it takes about 2 seconds to compute the requested value.
What I want to happen
I want API Gateway to see that the data is not in the cache, let the first call through to my backend service while the other requests are held in queue, populate the cache from the first call, and then respond to the other 499,999 requests using the cached data.
What is (seems to be) happening
API Gateway, seeing that there is no cached value, is sending every one of the 500,000 requests to the backend service! So I will be recomputing the value with some complex db query way more times than resources will allow. This happens because the last call comes into API Gateway before the first call has populated the cache.
Is there any way I can get this behavior?
I know that based on my example that perhaps I could prime the cache by invoking the API call myself just before broadcasting the bulk push job, but the actual use-case is slightly more complicated than my simplified example. But rest assured, solving this simplified use-case will solve what I am trying to do.
If you anticipate that kind of burst concurrency, priming the cache yourself is certainly the best option. Have you also considered adding throttling to the stage/method to protect your backend from a large surge in traffic? Clients could be instructed to retry on throttles and they would eventually get a response.
I'll bring your feedback and proposed solution to the team and put it on our backlog.

Amazon EC2 EBS i/o costs

I am hosting my application on amazon ec2 , on one of their micro linux instances.
It costs (apart from other costs) $0.11 per 1 million I/O requests . I was wondering how much I/O requests does it take when I have say 1000 users using it for say 1 hours per day for 1 month ?
I guess my main concern is : if a hacker keeps hitting my login page (simple html) , will it increase the I/O request count ? I guess yes, as every time the server needs to do something to server that page.
There are a lot of factors that will impact your IO requests, as #datasage says, try it and see how it behaves under your scenario. Micro Linux instances are incredible cheap to begin with, but if you are really concerned, setup a billing alert that will notify you when your usage passes a pre-determined threshold - if it suddenly spikes up, you can take some action to shut it down if that is what you want.
https://portal.aws.amazon.com/gp/aws/developer/account?ie=UTF8&action=billing-alerts
Take a look at CloudWatch, and (for free) set up a VolumeWriteOps and VolumeReadOps alarm to work with Amazon Simple Notification Service (SNS) to send you a text message and eMail notice right away if things get too busy, before the bill gets high! (A billing alert will let you know too late - after it has reached the threshold.)
In general though, from my experience, you will not have the problem you outline. Scan the EC2 Discussion Forum at forums.aws.amazon.com where you would find evidence of this kind of problem if were prevalent; it does not seem to be happening.
#Dilpa yes you are right. If some brute force attack will occur to your website eg: somebody hitting to your loginn page then it will increase the server I/O if you enable loging for your webserver. Webserver will keep log to it's log files of every event and that will increase your I/O. Just verify your webserver log for such kind of attack and you can prevent them.

Clustered Jboss servers - Maintaining a unique count

We have a web service receiving requests at a very high frequency.
The application is deployed on two jboss servers in a cluster (for load balancing)
We process the requests and determine some requests as "good requests".
Now we want to treat certain percentage of the "good requests" specially(send downstream to another system).This percentage value is configurable.
For example if the percentage is 75%, for every 4 "good requests" we receive, 3 of them has to treated specially(sent downstream) and the 4th one must be ignored.
I do not want to add the process of determining if the "good request" has to be sent downstream as part of the existing system. Since that will slow down the processing time.
Here is the solution I thought for this problem.
Create JMS queues in each jboss server and send the "good request" to the queue.
Another module/application(name: Downstream module) will read the JMS queue. It will maintain count to determine if the good request has to sent downstream or not.
Count - is the count of requests received by the downstream module
There are couple of problems that I foresee with the above approach, since the application is deployed in two servers.
The count is not going to be unique for the two servers, each "downstream module" will have its own count.
Since count is not unique, it is possible that we are not sending the configured percentage.
For example if the percentage is 75% and if we receive 6 requests. It is possible 3 went to server1 and 3 went to server2. The count in each "downstream module" will be 3.
So we would not have effectively sent even 1 "good request" to the downstream system, which is not desirable.
I thought I would maintain a table in the database that will insert "good requests" and generate a unique number(Good_request_count) for that, and use the Good_request_count in each "downstream module" make a decision of whether it has to be treated specially or not.
But I am afraid my solution is very inefficient, since it would involve the following
Having two JMS queues in (one for each jboss)
Have two "Downstream module" in each server to determine if the good request has to be sent downstream.
Each downstream module has to insert some key value in "good request" and read a unique count value from the database.
The underlying problem is, I have two servers and I want a unique count across both.
Can anyone please suggest me a better solution or point me inefficiencies in the system that can be improved.
Please let me know if I am not clear in any part, I can explain that.
Thanks a ton for reading !
Here are two other strategies to consider:
Create a MBean containing the code that keeps count and deploy it in your deploy-hasingleton directory on each server in your cluster. You can refer to the MBean via JNDI lookup, and are guaranteed only a single instance of it will be accesible across the cluster.
If you are using a more recent version of JBoss, you have the option of using a singleton session EJB to keep track of the count.