We have an app that interacts with Facebook a lot, intensive enough to make us worry about the api limits that we know are there. My question is : How is it that some applications have like millions of users while they proactively engage with facebook and never face the api limits ? One such application is "hootsuite".
Do they implement sophisticated load-reduction mechanism? (queues, batches and caches comes to mind)
Does facebook somehow treat them specially? (partnership perhaps?)
Both options are possible.
I would recommend some form of load-reduction mechanism. This could be accomplished with caching data or executing heavy queries ahead of time (possibly in a cron job of sorts).
Facebook provides some good suggestions with regard to application API rate limiting here.
You can also get more information on rate limiting that is being enforced on your application by visiting this dashboard:
https://developers.facebook.com/apps/<app_id>/insights?ref=nav&sk=ae_<app_id>
Related
We run a website which heavily relies on the Amazon Product Advertising API (APAA). What happens is that when we experience a sudden spike in users it happens that we hit the rate-limit and all functions relying on the APAA shut down for a while. What can we do so that doesn't happen?
So, obviously we have some basic caching in place, but the APAA doesn't allow us to cache data for a very long time, and APAA queries can vary a lot so there may not be any cached data at all to query.
I think that your only option is to retry the API calls until they work — but do so in a smart way. Unfortunately, that's what everybody that gets throttled does and AWS expects people to handle that themselves.
You can implement an exponential backoff and add jitter to prevent cluster calls. AWS has a great blog post about solutions for this kind of problem: https://www.awsarchitectureblog.com/2015/03/backoff.html
We have a certain number of SOAP and REST Web Services, which provide legal information for clients. Management demands to log all the information which is requested by this services. Using logs they want to collect statistics and bill clients.
My colleague offered to use central relational database for logging.
I don’t like this solution, because number of services are growing and I think such architecture will be bottleneck for productivity.
Can you advise me what architectural design will be good for such kind of task ?
When you say the central database will be a bottleneck, do you mean that it will be too slow to keep up with the logging requests? Or are you saying you are expecting database changes for each logging request type?
I would define a more generic payload for logging (figure out your minimum standardized fields), and then create a database for those logs.
<log><loglevel>INFO</loglevel><systemName>ClientValueActualizer</systemName><userIp>123.123.123.432</userIp><logpayload><![CDATA[useful payload for billing]]</logpayload></log>
If you are worried about capacity, you could throw a queue in front of it, which would have the advantage of not bogging down the client if the logs are busy.
You can decouple the consumption of these messages into separate systems. each of which can understand the various payloads. The risk here is if you want to add new attributes, it will be difficult to control what systems are sending what. But that's just a general issue with decoupled services.
you can consider Apache Kafka as distributed commit log. This be good for performance wise as it scales out horizontally and it can deliver messages only when client pulls those messages.
I have a SOAP api that I would like to throttle access to on a User basis after "x" many calls have been received in "y" amount of time.
After searching around, the #1 consideration (obviously) is to consider your parameters for when to throttle users. However, I don't see much in the way of best practices/examples for implementing such a solution. I did see the Leaky Bucket Method which makes sense. I have to believe there are more ideas out there though.
Any other takers on how you go about implementing your throttling solution? Questions include:
Do any frameworks provide capabilities (e.g. Spring, etc.) for throttling in web apis?
Seems to me you would need to store access information per user. How do you minimize the database overhead for doing this EVERY call?
Do you even NEED to access a datastore to implement this?
For what its worth, I've sort of answered this question after working on some other production projects.
Home brew: Using Spring AOP to pointcut around the method calls prior to executing API method code is one home-brew way if you have your own algorithm to implement. This ends up being pretty elegant and flexible as you can capture a lot of metadata prior to deciding what to do with the request.
API Management Service: If you're talking about a production system and you have the budget, probably the best way to go is to delegate this to an API Management layer like Apigee or Mashery.
Advantage is that it separates the concerns so its easier to change and allows you to focus just on your API. This is especially helpful if business stakeholders are involved and you need a good UI and dictionary of terms.
Disadvantage, of course is the cost and the vendor lock in.
Hope this helps someone!
Should caching, which is a cross cutting concern, be ever turned into a web-service?
The question might be a little weird, but I feel in SOA, a service should be identified based around a business solution, and we should not expose services whose only responsibility is to cache Objects. This does not seem to be a business function at all. If anything it seems like a performance improvement.
Should we ever introduce and implement a service just to cache data? Wouldn't that be a hindrance to thinking in terms of your domain model itself? I mean whenever you need an object to be cached within another service, you will have to move this class to the cacheService.
What is the general opinion about this?
I think improving performance for users is a business function. I think your question really is whether you should factor out your caching to a single service, or do it internally to other services. The answer, as always is, it depends. But it certainly can work to have a dedicated caching service. For instance, Google does this for memcache.
EDIT: Again, there's more than one way to skin a cat. BUt you don't necessarily have to cache the definitive Person object. Another possibility is to use the caching service for rendered data, say a PDF account statement. For instance, Hi5 (a social networking site), uses memcache to cache fully prepared user profiles.
I don't think this is a very good idea, typically the concept of caching is to make data available and keep it closer to the point of usage. Moving this out to a true software as a service or other service model, in a traditional sense limits that ability.
Cache needs to be something fast, local and readily available.
if that service acts as a Facade to the previous non-cached service then yes, it could be "its own service". Especially for read-only or read-mostly data, this makes sense. But remember that caching like threading is a treacherous domain. It is easy to "do" but extremely difficult to do "correctly", and done incorrectly it can cause a ridiculous amount of problems that are difficult to debug and difficult to fix once you find the cause. Think about it this way, Content Distribution providers like Akamai built their entire business on providing caching as a service.
I would agree with your thinking that web services should expose business functionality. The idea of a cache web service seems like a bad idea.
Thinking about this from a usage perspective, what does this mean really? It seems to imply that clients would first hit the cache web service to get an object. If it's not there, they'd need to get it from the real web service then "push" the object back into the cache? That's asking a lot of a client. Any caching workflow should be transparent to the client.
If however your talking about backing some related web services with a cache that's "hidden" (and transparent) from the clients, used solely for the purpose to making the entire family of web services more responsive, then yes it might be a good idea.
I'm currently working on an app that works with Twitter, but while developing/testing (especially those parts that don't rely heavily on real Twitter data), I'd like to avoid constantly hitting the API or publishing junk tweets.
Is there a general strategy people use for taking it easy on the API (caching aside)? I was thinking of rolling my own library that would essentially intercept outgoing requests and return mock responses, but I wanted to make sure I wasn't missing anything obvious first.
I would probably start by mocking the specific parts of the API you need for your application. In fact, this may actually force you to come up with a cleaner design for your app, because it more or less requires you to think about your application in terms of "what" it should do rather than "how" it should do it.
For example, if you are using the Twitter Search API, your application most likely should not care whether or not you are using the JSON or the Atom format option. The ability to search Twitter using a given query and get results back represents the functionality you want, so you should mock the API at that level of abstraction. The output format is just an implementation detail.
By mocking the API in terms of functionality instead of in terms of low-level implementation details, you can ensure that the application does what you expect it to do, before you actually connect to Twitter for real. At that point, you've already verified that the app works as intended, so the only thing left is to write the code to make the REST requests and parse the responses, which should be fairly straightforward, so you probably won't end up hitting Twitter with a lot of junk data at that point.
Caching is probably the best solution. Besides that, I believe the API is limited to 100 requests per hour. So maybe make a function that keeps counting each request and as it gets close to 100, it says, OK, every 10 API requests I will pull data. It wouldn't be hard set, probably a gradient function that curbs off when you are nearing the limit.
I've used Tweet#, it caches and should do everything you need since it has 100% of twitter's api covered and then some...
http://dimebrain.com/2009/01/introducing-tweet-the-complete-fluent-c-library-for-twitter.html
Cache stuff in a database... If the cache is too old then request the latest data via the API.
Also think about getting your application account white-listed, it will allow you to have a 20,000 api request limit per hour vs the measly 100 (which is made for a user not an application).
http://twitter.com/help/request_whitelisting