Restful API - handling large amounts of data

Restful API - handling large amounts of data - web-services

I have written my own Restful API and am wondering about the best way to deal with large amounts of records returned from the API.
For example, if I use GET method to myapi.co.uk/messages/ this will bring back the XML for all message records, which in some cases could be 1000's. This makes using the API very sluggish.
Can anyone suggest the best way of dealing with this? Is it standard to return results in batches and to specify batch size in the request?

You can change your API to include additional parameters to limit the scope of data returned by your application.
For instance, you could add limit and offset parameters to fetch just a little part. This is how pagination can be done in accordance with REST. A request like this would result in fetching 10 resources from the messages collection, from 21st to 30th. This way you can ask for a specific portion of a huge data set:
myapi.co.uk/messages?limit=10&offset=20
Another way to decrease the payload would be to only ask for certain parts of your resources' representation. Here's how facebook does it:
/joe.smith/friends?fields=id,name,picture
Remember that while using either of these methods, you have to provide a way for the client to discover each of the resources. You can't assume they'll just look at the parameters and start changing them in search of data. That would be a violation of the REST paradigm. Provide them with the necessary hyperlinks to avoid it.
I strongly recommend viewing this presentation on RESTful API design by apigee (the screencast is called "Teach a Dog to REST"). Good practices and neat ideas to approach everyday problems are discussed there.
EDIT: The video has been updated a number of times since I posted this answer, you can check out the 3rd edition from January 2013

There are different ways in general by which one can improve the API performance including for large API sizes. Each of these topics can be explored in depth.
Reduce Size Pagination
Organizing Using Hypermedia
Exactly What a User Need With Schema Filtering
Defining Specific Responses Using The Prefer Header
Using Caching To Make Response
More Efficient More Efficiency Through Compression
Breaking Things Down With Chunked Responses
Switch To Providing More Streaming Responses
Moving Forward With HTTP/2
Source: https://apievangelist.com/2018/04/20/delivering-large-api-responses-as-efficiently-as-possible/

if you are using .net core
you have to try this magic package
Microsoft.AspNetCore.ResponseCompression
then use this line in configureservices in startup file
services.AddResponseCompression();
then in configure function
app.UseResponseCompression();

Related

How does Ember Data manage large amount of records?

I have been working with Ember Data and i'm trying to understand some concepts. I have a quite heavy data intensive app, my back-end has endpoints that return a lot of records.
So, basically i have Route's that have something like this.store.findAll('places') which can return thousands of places having each one several text intensive fields like services or description.
This is only one of the resources, there are a few more that handle that amount of data as well.
My main concern is that the app hits some kind of limit or becomes unresponsive. So my question is that: How does Ember Data manage large amount of records ? Is there any best practice to handle those kind of scenarios ?

How does Ember Data manage large amount of records?
The same way as it handles a small amount of records. It's not going to do anything special for performance if you try to load/fetch a large number of records. You need to handle that yourself.
Is there any best practice to handle those kind of scenarios?
Unfortunately, no. Pagination of some sort is really the only way to accomplish this. But as you can see in this thread, there's quite a bit of discussion about the "best" way to do it. There are adapters and plugins made to handle this scenario, as well as server-side boilerplate designed to make it easy. But there really is no canonical way of doing pagination with Ember Data.
In my opinion, the best way to handle large amounts of data is to design a query endpoint and implement it on your server, handling everything yourself. This will be the most tailored to your application and the easiest to understand. If it sounds complicated, that's because it is. Data set segmentation/pagination is not a simple problem to solve, you will definitely run into issues along the way. That's why there's no agreed-upon best practice yet.
Update: Javier Cadiz mentioned the JSON API in the comments so I thought I would mention it. The JSON API does seem to be the new defacto standard for Ember Data, and it does specifiy a pagination method. However, the JSON API is fairly new and isn't widely adopted yet. I believe it wasn't until very recently that Ember Data switched to the JSON API adapter as its default. Using this pagination would mostly likely require you to conform to the entire API, not just the pagination aspect. (Although you can always steal certain ideas from it.) Because of that, I'm not sure if I'd call it a best practice just yet.
Bottom line: the JSON API way of pagination may be the way of the future, but it's not currently very popular. (Although that's just my opinion based on what I see/read. There's no saying how many people are using it privately.)

REST vs RPC for a C++ API

I am writing a C++ API which is to be used as a web service. The functions in the API take in images/path_to_images as input parameters, process them, and give a different set of images/paths_to_images as outputs. I was thinking of implementing a REST interface to enable developers to use this API for their projects (independent of whatever language they'd like to work in). But, I understand REST is good only when you have a collection of data that you want to query or manipulate, which is not exactly the case here.
[The collection I have is of different functions that manipulate the supplied data.]
So, is it better for me to implement an RPC interface for this, or can this be done using REST itself?

Like lcfseth, I would also go for REST. REST is indeed resource-based and, in your case, you might consider that there's no resource to deal with. However, that's not exactly true, the image converter in your system is the resource. You POST images to it and it returns new images. So I'd simply create a URL such as:
POST http://example.com/image-converter
You POST images to it and it returns some array with the path to the new images.
Potentially, you could also have:
GET http://example.com/image-converter
which could tell you about the status of the image conversion (assuming it is a time consuming process).
The advantage of doing it like that is that you are re-using HTTP verbs that developers are familiar with, the interface is almost self-documenting (though of course you still need to document the format accepted and returned by the POST call). With RPC, you would have to define new verbs and document them.

REST use common operation GET,POST,DELETE,HEAD,PUT. As you can imagine, this is very data oriented. However there is no restriction on the data type and no restriction on the size of the data (none I'm aware of anyway).
So it's possible to use it in almost every context (including sending binary data). One of the advantages of REST is that web browser understand REST and your user won't need to have a dedicated application to send requests.
RPC presents more possibilities and can also be used. You can define custom operations for example.
Not sure you need that much power given what you intend to do.
Personally I would go with REST.
Here's a link you might wanna read:
http://www.sitepen.com/blog/2008/03/25/rest-and-rpc-relationship/

Compared to RPC, REST's(json style interface) is lightweight, it's easy for API user to use. RPC(soap/xml) seems complex and heavy.
I guess that what you want is HTTP+JSON based API, not the REST API that claimed by the REST author
http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven

Windows Phone 7 - Best Practices for Speeding up Data Fetch

I have a Windows Phone 7 app that (currently) calls an OData service to get data, and throws the data into a listbox. It is horribly slow right now. The first thing I can think of is because OData returns way more data than I actually need.
What are some suggestions/best practices for speeding up the fetching of data in a Windows Phone 7 app? Anything I could be doing in the app to speed up the retrieval of data and putting into in front of the user faster?

Sounds like you've already got some clues about what to chase.
Some basic things I'd try are:
Make your HTTP requests as small as possible - if possible, only fetch the entities and fields you absolutely need.
Consider using multiple HTTP requests to fetch the data incrementally instead of fetching everything in one go (this can, of course, actually make the app slower, but generally makes the app feel faster)
For large text transfers, make sure that the content is being zipped for transfer (this should happen at the HTTP level)
Be careful that the XAML rendering the data isn't too bloated - large XAML structure repeated in a list can cause slowness.
When optimising, never assume you know where the speed problem is - always measure first!
Be careful when inserting images into a list - the MS MarketPlace app often seems to stutter on my phone - and I think this is caused by the image fetch and render process.

In addition to Stuart's great list, also consider the format of the data that's sent.
Check out this blog post by Rob Tiffany. It discusses performance based on data formats. It was written specifically with WCF in mind but the points still apply.

As an extension to the Stuart's list:
In fact there are 3 areas - communication, parsing, UI. Measure them separately:
Do just the communication with the processing switched off.
Measure parsing of fixed ODATA-formatted string.
Whether you believe or not it can be also the UI.
For example a bad usage of ProgressBar can result in dramatical decrease of the processing speed. (In general you should not use any UI animations as explained here.)
Also, make sure that the UI processing does not block the data communication.

flash - django communication -- amf, xml, or json?

We are considering to develop a Flash front-end to a web application written using Django. The Flash front-end will send a simple "id" to the server and in response receive a couple of objects. The application will be open only to authenticated users.
To the extend of my current knowledge (which is basic for Flash) we can either use AMF or take an XML or JSON approach. AMF seems to have an upperhand as there are examples out on the internet showing it can cooperate easily with Django's authentication mechanism (most examples feature pyAMF). On the other hand, implementing a XML/JSON based solution may be easier and hassle free.
Guidance will be much appreciated.

We've used pyAMF + Django on many projects here, and it's a breeze to setup and get running. If you need speed, AMF3 is probably your best bet. It's the smallest/fastest way to transfer data, and serialization is taken care of for you.
On the flip side, setting up json with Django isn't much work either, and it would give you the ability to hook other, non-AMF systems into it without any extra work. You just sacrifice a little speed for that benefit.
If you think you'll ever need other systems working with the backend, or if you think you might switch to an HTML-only, or offer some kind of non-Flash version of your app, I'd go JSON, otherwise, I'd use AMF.

first of all, you should design your app in such a way this doesn't matter. the transport layer should be completely encapsulated, leaving the encoding format transparent to the rest of the app.
personally I prefer JSON to AMF because it's human readable (which makes debugging easier) and there are implementations for every platform/language (so you can reuse the server part with JavaScript for example). And I prefer JSON to XML because it's more compact and semantically less unambiguous as well as closer to common object models. Also it can transport numerical and boolean data in a typesafe manner.

JSON will probaby have the least complications and there's a great google code project that has JSON encoders and decoders here: http://code.google.com/p/as3corelib/

How do you document a tree-to-tree transformation in a human-readable format?

I need to document an application that serves as a facade for a set of webservices. The application accepts SOAP requests and transforms these requests into a format understandable by the underlying web service. There are several such services, each with its own interface. Some accept SOAP, some HTTP POST, some... other formats not mentioned in polite society.
I need to document how we map the fields from our SOAP calls to the fields for these other formats. Before everyone cries "XSLT" I must mention that the notation must be human-friendly. Ideally it would be something Excel-able.
Has anyone encountered this sort of problems before? How did you solve it? Is there a human-friendly notation for tree-to-tree transformations that can fit on a spreadsheet?

I've had to do just this. The way I did it was to just start writing, following the hierarchical structure.
I eventually would find that I was repeating myself. An example was that certain elements had a common set of attributes. I would pull the documentation of that common set up before the sections on the specific elements. Same thing with documentation of handling of specific simpleTypes.
Eventually, there was even some high level discussion on the overall flow and "philosophy" of the transformation. But I let it all happen bit by bit, fixing it as I became bored with repetition.
That said, I'm a developer, not a tech writer.

I haven't really found anything so far, but I've found pointers to many libraries that help transform objects of one type to another in Java. For reference, I'm listing the most promising ones here, all doing some kind of JavaBean to JavaBean conversion:
Transmorph
EZMorph
Dozer

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js