How to relieve a rate-limited API? - amazon-web-services

We run a website which heavily relies on the Amazon Product Advertising API (APAA). What happens is that when we experience a sudden spike in users it happens that we hit the rate-limit and all functions relying on the APAA shut down for a while. What can we do so that doesn't happen?
So, obviously we have some basic caching in place, but the APAA doesn't allow us to cache data for a very long time, and APAA queries can vary a lot so there may not be any cached data at all to query.

I think that your only option is to retry the API calls until they work — but do so in a smart way. Unfortunately, that's what everybody that gets throttled does and AWS expects people to handle that themselves.
You can implement an exponential backoff and add jitter to prevent cluster calls. AWS has a great blog post about solutions for this kind of problem: https://www.awsarchitectureblog.com/2015/03/backoff.html

Related

General guidance around Bigtable performance

I'm using a single node Bigtable cluster for my sample application running on GKE. Autoscaling feature has been incorporated within the client code.
Sometimes I experience slowness (>80ms) for the GET calls. In order to investigate it further, I need some clarity around the following Bigtable behaviour.
I have cached the Bigtable table object to ensure faster GET calls. Is the table object persistent on GKE? I have learned that objects are not persistent on Cloud Function. Do we expect any similar behaviour on GKE?
I'm using service account authentication but how frequently auth tokens get refreshed? I have seen frequent refresh logs for gRPC Java client. I think Bigtable won't be able to serve the requests over this token refreshing period (4-5 seconds).
What if client machine/instance doesn't scale enough? Will it cause slowness for GET calls?
Bigtable client libraries use connection pooling. How frequently connections/channels close itself? I have learned that connections are closed after minutes of inactivity (>15 minutes or so).
I'm planning to read only needed columns instead of entire row. This can be achieved by specifying the rowkey as well as column qualifier filter. Can I expect some performance improvement by not reading the entire row?
According to GCP official docs you can get here the cause of slower performance of Bigtable. I would like to suggest you to go through the docs that might be helpful. Also you can see Troubleshooting performance issues.

Load testing AWS SDK client

What is the recommended way to performance test AWS SDK clients? I'm basically just listing/describing resources and would like to see what happens when I query 10k objects. Does AWS provide some type of mock API, or do I really need to request 10k of each type of resource to do this?
I can of course mock in at least two levels:
SDK: I wrap the SDK with my own interfaces and create mocks. This doesn't exercise the SDK's JSON to objects code and my mocks affect the AppDomain with additional memory, garbage collection, etc.
REST API: As I understand it the SDKs are just wrappers to the REST API (hence the HTTP response codes shown in the objects. It seems I can configure the SDK to go to custom endpoints.
This isolates the mocks from the main AppDomain and is more representative, but of course I'm still making some assumptions about response time, limits, etc.
Besides the above taking a long time to implement, I would like to make sure my code won't fail at scale, either locally or at AWS. The only way I see to guarantee that is creating (and paying for) the resources at AWS. Am I missing anything?
When you query 10k or more objects you'll have to deal with:
Pagination - the API usually returns only a limited number of items per call, providing NextToken for the next call.
Rate Limiting - if you hammer some AWS APIs too much they'll rate limit you which the SDK will probably report as some kind of Rate Limit Exceeded Exception.
Memory usage - hopefully you don't collect all the results in the memory before processing. Process them as they arrive to conserve your operating memory.
Other than that I don't see why it shouldn't work.
Update: Also check out Moto - the AWS mocking library (for Python) that can also run in a standalone mode for use with other languages. However as with any mocking it may not behave 100% the same as the real thing, for instance around the Rate Limiting behaviour.

I need feedback on this partly serverless architecture design

I want to host a scalable blog or application of this sort in nodeJS on AWS making use of AWS technologies. The idea here is to have a small EC2 server that is not responsible for serving the website, but only for running the CMS/admin panel. While these operations could be serverless as well, I think having a dedicated small VM EC2 instance could be more efficient, and works better with existing frameworks, etc.
In my diagram above, you can see there's two type of users audiences and admin/writers. Admin CRUD operations also cause lambda to run. Lambda generates the static site after Admin changes, which is delivered to S3. Users are directed to the static site hosted in S3. Only admins/writers have access to the server-connecting part of the site.
I think this is a good design for an extremely scalable and relatively cheap site, as long as the user-facing side is all static. An alternative to this is a CDN, but then I have to deal with cache invalidation issues, a site that updates slower, and a larger server.
This seems like a win-win to me. Feedback?
This ought to be a comment rather than an answer, but as I don't have enough points...
There are a couple of other considerations for this architecture. Lambda functions are great for scaling out microservices horizontally with each small function being executed in parallel tens or hundreds of times. Generation of a static site is typically a single threaded operation so you may not see the gains you expect, you'll also need to watch the timeout period (maximum 300 seconds currently) and make sure that you can generate the site in that time. Of course if you are not running Lambda code you are not getting charged.
For your admin frontend I would suggest ElasticBeanstalk, even if you peg it at a single instance, it gives you lots of great features like rolling updates.
Good luck with the project.

How do the big boys deal with api limits?

We have an app that interacts with Facebook a lot, intensive enough to make us worry about the api limits that we know are there. My question is : How is it that some applications have like millions of users while they proactively engage with facebook and never face the api limits ? One such application is "hootsuite".
Do they implement sophisticated load-reduction mechanism? (queues, batches and caches comes to mind)
Does facebook somehow treat them specially? (partnership perhaps?)
Both options are possible.
I would recommend some form of load-reduction mechanism. This could be accomplished with caching data or executing heavy queries ahead of time (possibly in a cron job of sorts).
Facebook provides some good suggestions with regard to application API rate limiting here.
You can also get more information on rate limiting that is being enforced on your application by visiting this dashboard:
https://developers.facebook.com/apps/<app_id>/insights?ref=nav&sk=ae_<app_id>

How can I scale a webapp with long response time, which currently uses django

I am writing a web application with django on the server side. It takes ~4 seconds for server to generate a response to the user. It makes use of a weather api. My application has to make ~50 query to that api for each user request.
Server side uses urllib of python for using the weather api. I used pythons threading to speed up the process because urllib is synchronous. I am using wsgi with apache. The problem is wsgi stack is fully synchronous and when many users use my application, they have to wait for one anothers request to finish. Since each request takes ~4 seconds, this is unacceptable.
I am kind of stuck, what can I do?
Thanks
If you are using mod_wsgi in a multithreaded configuration, or even a multi process configuration, one request should not block another from being able to do something. They should be able to run concurrently. If using a multithreaded configuration, are you sure that you aren't using some locking mechanism on some resource within your own application which precludes requests running through the same section of code? Another possibility is that you have configured Apache MPM and/or mod_wsgi daemon mode poorly so as to preclude concurrent requests.
Anyway, as mentioned in another answer, you are much better off looking at caching strategies to avoid the weather lookups in the first place, or offloading to client.
50 queries to an outside resource per request is probably a bad place to be, and probably not neccesary at all.
The weather doesn't change all that quickly, and so you can probably benefit enormously by just caching results for a while. Then it doesn't matter how many requests you're getting, you don't need to do more than a few queries per day
If that's not your situation, you might be able to get the client to do the work for you. Refactor the code so that the weather api aggregation happens on the client in javascript, rather than funneling it all through the server.
Edit: based on comments you've posted, what you are asking for probably cannot be optimized within the constraints of the API you are using. The problem is that the service is doing a good job of abstracting away the differences in the many sources of weather information they aggregate into a nearest location query. after all, weather stations provide only point data.
If you talk directly to the technical support people that provide the API, you might find that they are willing to support more complex queries (bounding box), for which they will give you instructions. More likely, though, they abstract that away because they don't want to actually reveal the resolution that their API actually provides, or because there is some technical reason in the way that they model their data or perform their calculations that would make such queries too difficult to support.
Without that or caching, you are just out of luck.