I am using new relic to get some info on how long response times are taking. I have also been doing load tests using blitz. I can see on new relic that for a lot of the api endpoints it is taking around 300 ms average (which I am totally happy with these are geo spatial queries btw). The only issue is the maximum is 55k ms and some users are complaining of certain things taking a while to load.
How can I make these endpoints more reliable that they will take 300 ms more often than 55k ms?
edit:
The main question is why do these responses sometimes take 55k ms? Is this user connection speed or the code?
Related
I have a client which has a pretty popular ticket selling service, to the point that the microservice based backend is struggling to keep up, I need to come up with a solution to optimize and loadbalance the system. The infrastructure works through a series of interconnected microservices.
When a user enter the sales channels (mobile or web app), the request is directed to an AWS API Gateway which is in charge of orchestrating the communication towards the microservice in charge of obtaining the requested resources.
These resources are provided from a third party API
This third party has physical servers in each venue in charge of synchronizing the information between the POS systems and the digital sales channels.
We have a REDIS instance in charge of caching these requests that we make to the third party API, we cache each endpoint with a TTL relative to the frequency of updating the information.
Here is some background info:
We get traffic mostly from 2 major countries
On a normal day, about 100 thousands users will use the service, with an 70%/30% traffic relation in between the two countries
On important days, each country has different opening hours (Country A starts sales at 10 am UTC, but country B starts at 5 pm UTC), on these days the traffic increases some n times
We have a main MiddleWare through which all requests made by clients are processed.
We have a REDIS cache database that stores GETs with different TTLs for each endpoint.
We have a MiddleWare that decides to make the request to the cache or to the third party's API, as the case may be.
And these are the complaints I have gotten that need to be deal with:
When a country receives a high amount of requests, the country with the least traffic gets negatively affected, the clients do not respond, or respond partially because the computation layer's limit was exceeded and so the users have a bad experience
Every time the above happens, the computation layer must be manually increased from the infrastructure.
Each request has different response times, stadiums respond in +/- 40 seconds and movie theaters in 3 seconds. These requests enter a queue and are answered in order of arrival.
The error handling is not clear. The errors are mixed up and you can't tell from which country the errors are coming from and how many errors there are
The responses from the third party API are not cached correctly in the cache layer since errors are stored for the time of the TTL
I was thinking of a couple of thinks that I could suggest:
Adding in instrumentation of the requests by using AWS X-Ray
Adding in a separate table for errors in the redis cache layer (old data has to be better than no data for the end user)
Adding in AWS elastic load balancing for the main middleware
But I'm not sure how realistic would be to implement these 3 things, I'm also not sure if they would even solve the problem, I personally don't really have experience with optimizing this type of backed. I would appreciate any suggestions, recommendations, links, documentation, etc. I'm really desperate for a solution to this problem
few thoughts:
When a country receives a high amount of requests, the country with the least traffic gets negatively affected, the clients do not respond, or respond partially because the computation layer's limit was exceeded and so the users have a bad experience
A common approach in aws is to regionalize stack - assuming you are using cdk/cloud formation creating regionalized stack should be a straightforward task.
But it is a question if this will solve the problem. Your system suffers from availability issues, regionalization will isolate this problem down to regions. So we should be able to do better (see below)
Every time the above happens, the computation layer must be manually increased from the infrastructure.
AWS has an option to automatically scale up and down based on traffic patterns. This is a neat feature, given you set limits to make sure you are not overcharged.
Each request has different response times, stadiums respond in +/- 40 seconds and movie theaters in 3 seconds. These requests enter a queue and are answered in order of arrival.
It seems that the large variance is because you have to contact the servers at venues. I recommend to decouple that activity. Basically calls to venues should be done async; there are several ways you could do that - queues and customer push/pull are the approaches (please, comment if more details are needed. but this is quite standard problem - lots of data in the internet)
The error handling is not clear. The errors are mixed up and you can't tell from which country the errors are coming from and how many errors there are
That's a code fix, when you do send data to cloudwatch (do you?). You could put country as a context to all request, via filter or something. And when error is logged that context is logged as well. You probably need venue id even more than country, as you can conclude country from venue id.
The responses from the third party API are not cached correctly in the cache layer since errors are stored for the time of the TTL
Don't store errors + add a circuit breaker pattern.
I have hosted my node app in Cloud Run and all of my requests served within 300 - 600ms time. But one endpoint that gets data from a 3rd party service so that request takes 1.2s - 2.5s to complete the request.
My doubts regarding this are
Is 1.2s - 2.5s requests suitable for cloud run? Or is there any rule that the requests should be completed within xx ms?
Also see the screenshot, I got a message along with the request in logs "The request caused a new container instance to be started and may thus take longer and use more CPU than a typical request"
What caused a new container instance to be started?
Is there any alternative or work around to handle long requests?
Any advice / suggestions would be greatly appreciated.
Thanks in advance.
I don't think that will be an issue unless you're worried about the cost of the CPU/memory time, which honestly should only matter if you're getting 10k+ requests/day. So, probably doesn't matter and cloud run can handle that just fine (my own app does requests longer than that with no problem)
It's possible that your service was "scaled to zero" meaning that there were no containers left running to serve requests. In that case, it would be necessary to start up a new instance and wait for whatever initializing/startup costs are associated with that process. It's also possible that it was auto-scaled due to all other instances being at their request limits. Make sure that your setting for max concurrent requests per instance is set greater than one - Node/Express can handle multiple requests at once. Plus, you'll only get charged for the total time spend, not per request:
In situations where you get very long (30 seconds, minutes+) operations, it may be a good idea to switch to some different data transfer method. You could use polling, where the client makes a request every 5 seconds and checks if the response is ready. You could also switch to some kind of push-based system like WebSockets, but Cloud Run doesn't have support for that.
TL;DR longer requests (~10-30 seconds) should be fine unless you're worried about the cost of the increased compute time they may occur at scale.
We want to collect some metrics about our client public Facebook pages (~1-5K users) on a daily (or weekly) basis.
I'm talking about 3-5 typical metrics : "likes", "fan posts" etc.
I understand that according to the "Rate Limiting on the Graph API" documentation [1] it's possible to have 200 calls per 1 hour.
For now we don't have any FB public application that can help us to increase this limit. To generate application token we will create it to but I doubt it will have a lot of users.
Does anybody know will we have problems with rate limit exceptions while invoking Graph API more than 200 times per 60 min.?
I guess our expected rate is 5-10K calls per 60 min (once a day).
Phrase from the documentation [1] "Rate limiting in the FB Graph API is encountered only in rare circumstances" gives me hope that it won't be a problem.
Thank you!
[1] https://developers.facebook.com/docs/graph-api/advanced/rate-limiting
You won't have any problems initially. Facebook does not necessarily block apps immediately for being over the limits.
As per their documentation
If your app is making enough calls to be considered for rate limiting by our system, we return an X-App-Usage HTTP header
So, if you don't get any X-App-Usage header,Then your app hasn't be considered "worthy" of throttling by their automated systems yet.
So it would be best to check for this header, while making your api requests. Once you start receiving this Header, it would be best change your frequency of the API calls or give a timeout.
I have a python program which query youtube to get the video details. I use the version-3 api. I have multiple processes m and a python pool of 10 processes in each python process.
songs_pool = Pool()
songs_pool =Pool(processes=10)
return_pool = songs_pool.map(getVideo,songs_list)
I get some client errors when the value of m is increased to more than 2 and the pool is increased to >5. I get forbidden errors. When I check the number of requests in the google analytics,it shows that the number of requests are 250 per sec. But according to the documentation the limit is 3000 requests per sec. I dont understand why am I getting the client errors. Can you tell me if there is a way to not get this errors and run the program quicker.
if m = 2 and process = 10 , i get no errors but it takes so much time to complete.
But if I increase them , then I get client errors which are ~ 5% of the total requests.
The per-user-limit is 3000 requests per second from a single IP address, and as soon as you go above that in a given second you'll start getting the forbidden errors. The analytics you see in the developers console will only report your average number of requests over a 5 minute period; therefore, if you had zero requests for 4 minutes, then started running your routine, the console may show only 250 requests per second (as an average) but your app likely is overrunning the limit in a given period of time or two.
It seems that you're handling it in the best way possible if speed is your concern; you'll want to run it fast enough to get a very small number of errors (so you know you're staying up there at your limit). Another option, though, might be to look into using etags; if you find yourself requesting info on the same videos a lot, you can let etags tell you whether or not any info has changed (and if the API responds that nothing has changed, it doesn't count against either your quota or your reqests/sec.)
I'm developing a web service that measures site response time. It's a Django app, and as an initial test I pointed it at a couple of my other Django sites on the same VPS. The response time was small (~5ms) most of the time, but on a fairly regular (5 or 10 min) schedule, jumped to a much higher value (up to 400ms, despite no heavy DB load or cache).
Suspicious of my own timing methodology, I pointed it at a static site on the same VPS, and got a consistent quick response. I then used nginx's response_time and upstream_response_time logging to find that it really was my Django apps that were giving the response time variance.
I then pointed the app at a few other Django sites around the web and found similar results: there's a fast "baseline" response with fairly regular spikes to one or two remarkably repeatable slower times. For example, one has a 25ms baseline with ~200ms and ~400ms "steps".
Using the Django Debug Toolbar's Time tab on one of my sites, I can see this behaviour. Most F5 reloads are quick, but there's an occasional (fairly consistently one in ten) slow one, with all the time spent in the "request" section.
Any ideas?
Solved it: it was gunicorn's "max-requests" setting, spawning a new worker every X connections. When X is ten and I'm hitting a "quiet" site, the long response time is every ten hits.
That's why I only noticed the multi-modal times on some Django sites: the others were presumably busy or not using gunicorn.