How to avoid sending 2 duplicate POST requests to a webservice - web-services

I send a POST request to create an object. That object is created successfully on the server, but I cannot receive the response (dropped somewhere), so I try to send the POST request again (and again). The result is there are many duplicated objects on the server side.
What is the official way to handle that issue? I think it is a very common problem, but I don't know its exact name, so cannot google it. Thanks.

In REST terminology, which is how interfaces where POST is used to create an object (and PUT to modify, DELETE to delete and GET to retrieve) are called, the POST operation is attributed un-'safe' and non-'idempotent, because the second operation of every other type of petition has no effect in the collection of objects.
I doubt there is an "official" way to deal with this, but there are probably some design patterns to deal with it. For example, these two alternatives may solve this problem in certain scenarios:
Objects have unicity constraints. For example, a record that stores a unique username cannot be duplicated, since the database will reject it.
Issue an one-time use token to each client before it makes the POST request, usually when the client loads the page with the input form. The first POST creates an object and marks the token as used. The second POST will see that the token is already used and you can answer with a "Yes, yes, ok, ok!" error or success message.
Useful link where you can read more about REST.

It is unreliable to fix these issues on the client only.
In my experience, RESTful services with lots of traffic are bound to receive duplicate incoming POST requests unintentionally - e.g. sometimes a user will click 'Signup' and two requests will be sent simultaneously; this should be expected and handled by your backend service.
When this happens, two identical users will be created even if you check for uniqueness on the User model. This is because unique checks on the model are handled in-memory using a full-table scan.
Solution: these cases should be handled in the backend using unique checks and SQL Server Unique Indices.

Related

How to invalidate AWS APIGateway cache

We have a service which inserts into dynamodb certain values. For sake of this question let's say its key:value pair i.e., customer_id:customer_email. The inserts don't happen that frequently and once the inserts are done, that specific key doesn't get updated.
What we have done is create a client library which, provided with customer_id will fetch customer_email from dynamodb.
Given that customer_id data is static, what we were thinking is to add cache to the table but one thing which we are not sure that what will happen in the following use-case
client_1 uses our library to fetch customer_email for customer_id = 2.
The customer doesn't exist so API Gateway returns not found
APIGateway will cache this response
For any subsequent calls, this cached response will be sent
Now another system inserts customer_id = 2 with its email id. This system doesn't know if this response has been cached previously or not. It doesn't even know that any other system has fetched this specific data. How can we invalidate cache for this specific customer_id when it gets inserted into dynamodb
You can send a request to the API endpoint with a Cache-Control: max-age=0 header which will cause it to refresh.
This could open your application up to attack as a bad actor can simply flood an expensive endpoint with lots of traffic and buckle your servers/database. In order to safeguard against that it's best to use a signed request.
In case it's useful to people, here's .NET code to create the signed request:
https://gist.github.com/secretorange/905b4811300d7c96c71fa9c6d115ee24
We've built a Lambda which takes care of re-filling cache with updated results. It's a quite manual process, with very little re-usable code, but it works.
Lambda is triggered by the application itself following application needs. For example, in CRUD operations the Lambda is triggered upon successful execution of POST, PATCH and DELETE on a specific resource, in order to clear the general GET request (i.e. clear GET /books whenever POST /book succeeded).
Unfortunately, if you have a View with a server-side paginated table you are going to face all sorts of issues because invalidating /books is not enough since you actually may have /books?page=2, /books?page=3 and so on....a nightmare!
I believe APIG should allow for more granular control of cache entries, otherwise many use cases aren't covered. It would be enough if they would allow to choose a root cache group for each request, so that we could manage cache entries by group rather than by single request (which, imho, is also less common).
Did you look at this https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-caching.html ?
There is way to invalidate entire cache or a particular cache entry

Designing RESTful API for Invoking process methods

I would like to know how do design the RESTful web service for process methods. For example I want to make a REST Api for ProcessPayroll for given employee id. Since ProcessPayroll is time consuming job, I don't need any response from the method call but just want to invoke the ProcessPayroll method asynchronously and return. I can't use ProcessPayroll in the URL since it is not a resource and it is not a verb. So I thought that, I can go with the below approach
Request 1
http://www.example.com/payroll/v1.0/payroll_processor POST
body
{
"employee" : "123"
}
Request 2
http://www.example.com/payroll/v1.0/payroll_processor?employee=123 GET
Which one of the above approach is correct one? Is there any Restful API Design guidelines to make a Restful service for process methods and functions?
Which one of the above approach is correct one?
Of the two, POST is closest.
The problem with using GET /mumble is that the specification of the GET method restricts its use to operations that are "safe"; which is to say that they don't change the resource in any way. In other words, GET promises that a resource can be pre-fetched, just in case it is needed, by the user agent and the caches along the way.
Is there any Restful API Design guidelines to make a Restful service for process methods and functions?
Jim Webber has a bunch of articles and talks that discuss this sort of thing. Start with How to GET a cup of coffee.
But the rough plot is that your REST api acts as an integration component between the process and the consumer. The protocol is implemented as the manipulation of one or more resources.
So you have some known bookmark that tells you how to submit a payroll request (think web form), and when you submit that request (typically POST, sometimes PUT, details not immediately important) the resource that handles it as a side effect (1) starts an instance of ProcessPayroll from the data in your message, (2) maps that instance to a new resource in its namespace and (3) redirects you to the resource that tracks your payroll instance.
In a simple web api, you just keep refreshing your copy of this new resource to get updates. In a REST api, that resource will be returning a hypermedia representation of the resource that describes what actions are available.
As Webber says, HTTP is a document transport application. Your web api handles document requests, and as a side effect of that handling interacts with your domain application protocol. In other words, a lot of the resources are just messages....
We've come up with the similar solution in my project, so don't blame if my opinion is wrong - I just want to share our experience.
What concerns the resource itself - I'd suggest something like
http://www.example.com/payroll/v1.0/payrollRequest POST
As the job is supposed to be run at the background, the api call should return Accepted (202) http code. That tells the user that the operation will take a lot time. However you should return a payrollRequestId unique identifier (Guid for example) to allow users to get the posted resource later on by calling:
http://www.example.com/payroll/v1.0/payrollRequest/{payrollRequestId} GET
Hope this helps
You decide the post and get on the basis of the API work-
If your Rest API create any new in row DB(means new resource in DB) , then you have to go for POST. In your case if your payroll process method create any resource then you have to choose to POST
If your Rest API do both, create and update the resources. Means ,if your payroll method process the data and update it and create a new data , then go for PUT
If your Rest API just read the data, go for GET. But as I think from your question your payroll method not send any data.So GET is not best for your case.
As I think your payroll method is doing both thing.
Process the data , means updating the data and
Create new Data , means creating the new row in DB
NOTE - One more thing , the PUT is idempotent and POST is not.Follow the link PUT vs POST in REST
So, you have to go for PUT method.

Web service differences between REST and RPC

I have a web service that accepts JSON parameters and have specific URLs for methods, e.g.:
http://IP:PORT/API/getAllData?p={JSON}
This is definitely not REST as it is not stateless. It takes cookies into account and has its own session.
Is it RPC? What is the difference between RPC and REST?
Consider the following example of HTTP APIs that model orders being placed in a restaurant.
The RPC API thinks in terms of "verbs", exposing the restaurant functionality as function calls that accept parameters, and invokes these functions via the HTTP verb that seems most appropriate - a 'get' for a query, and so on, but the name of the verb is purely incidental and has no real bearing on the actual functionality, since you're calling a different URL each time. Return codes are hand-coded, and part of the service contract.
The REST API, in contrast, models the various entities within the problem domain as resources, and uses HTTP verbs to represent transactions against these resources - POST to create, PUT to update, and GET to read. All of these verbs, invoked on the same URL, provide different functionality. Common HTTP return codes are used to convey status of the requests.
Placing an Order:
RPC: http://MyRestaurant:8080/Orders/PlaceOrder (POST: {Tacos object})
REST: http://MyRestaurant:8080/Orders/Order?OrderNumber=asdf (POST: {Tacos object})
Retrieving an Order:
RPC: http://MyRestaurant:8080/Orders/GetOrder?OrderNumber=asdf (GET)
REST: http://MyRestaurant:8080/Orders/Order?OrderNumber=asdf (GET)
Updating an Order:
RPC: http://MyRestaurant:8080/Orders/UpdateOrder (PUT: {Pineapple Tacos object})
REST: http://MyRestaurant:8080/Orders/Order?OrderNumber=asdf (PUT: {Pineapple Tacos object})
Example taken from sites.google.com/site/wagingguerillasoftware/rest-series/what-is-restful-rest-vs-rpc
You can't make a clear separation between REST or RPC just by looking at what you posted.
One constraint of REST is that it has to be stateless. If you have a session then you have state so you can't call your service RESTful.
The fact that you have an action in your URL (i.e. getAllData) is an indication towards RPC. In REST you exchange representations and the operation you perform is dictated by the HTTP verbs. Also, in REST, Content negotiation isn't performed with a ?p={JSON} parameter.
Don't know if your service is RPC, but it is not RESTful. You can learn about the difference online, here's an article to get you started: Debunking the Myths of RPC & REST. You know better what's inside your service so compare it's functions to what RPC is and draw your own conclusions.
As others have said, a key difference is that REST URLs are noun-centric and RPC URLs are verb-centric. I just wanted to include this clear table of examples demonstrating that:
---------------------------+-------------------------------------+--------------------------
Operation | RPC (operation) | REST (resource)
---------------------------+-------------------------------------+--------------------------
Signup | POST /signup | POST /persons
---------------------------+-------------------------------------+--------------------------
Resign | POST /resign | DELETE /persons/1234
---------------------------+-------------------------------------+--------------------------
Read person | GET /readPerson?personid=1234 | GET /persons/1234
---------------------------+-------------------------------------+--------------------------
Read person's items list | GET /readUsersItemsList?userid=1234 | GET /persons/1234/items
---------------------------+-------------------------------------+--------------------------
Add item to person's list | POST /addItemToUsersItemsList | POST /persons/1234/items
---------------------------+-------------------------------------+--------------------------
Update item | POST /modifyItem | PUT /items/456
---------------------------+-------------------------------------+--------------------------
Delete item | POST /removeItem?itemId=456 | DELETE /items/456
---------------------------+-------------------------------------+--------------------------
Notes
As the table shows, REST tends to use URL path parameters to identify specific resources
(e.g. GET /persons/1234), whereas RPC tends to use query parameters for function inputs
(e.g. GET /readPerson?personid=1234).
Not shown in the table is how a REST API would handle filtering, which would typically involve query parameters (e.g. GET /persons?height=tall).
Also not shown is how with either system, when you do create/update operations, additional data is probably passed in via the message body (e.g. when you do POST /signup or POST /persons, you include data describing the new person).
Of course, none of this is set in stone, but it gives you an idea of what you are likely to encounter and how you might want to organize your own API for consistency. For further discussion of REST URL design, see this question.
It is RPC using http. A correct implementation of REST should be different from RPC. To have a logic to process data, like a method/function, is RPC. getAllData() is an intelligent method. REST cannot have intelligence, it should be dumb data that can be queried by an external intelligence.
Most implementation I have seen these days are RPC but many mistakenly call it as REST. REST with HTTP is the saviour and SOAP with XML the villain. So your confusion is justified and you are right, it is not REST. But keep in mind that REST is not new(2000) eventhough SOAP/XML is old, json-rpc came later(2005).
Http protocol does not make an implementation of REST. Both REST(GET, POST, PUT, PATCH, DELETE) and RPC(GET + POST) can be developed through HTTP(eg:through a web API project in visual studio for example).
Fine, but what is REST then?
Richardson maturity model is given below(summarized). Only level 3 is RESTful.
Level 0: Http POST
Level 1: each resource/entity has a URI (but still only POST)
Level 2: Both POST and GET can be used
Level 3(RESTful): Uses HATEOAS (hyper media links) or in other words self
exploratory links
eg: level 3(HATEOAS):
Link states this object can be updated this way, and added this way.
Link states this object can only be read and this is how we do it.
Clearly, sending data is not enough to become REST, but how to query the data, should be mentioned too. But then again, why the 4 steps? Why can't it be just Step 4 and call it REST? Richardson just gave us a step by step approach to get there, that is all.
You've built web sites that can be used by humans. But can you also
build web sites that are usable by machines? That's where the future
lies, and RESTful Web Services shows you how to do it.
This book RESTful Web Services helps
A very interesting read RPC vs REST
REST is best described to work with the resources, where as RPC is more about the actions.
REST
stands for Representational State Transfer. It is a simple way to organize interactions between independent systems.
RESTful applications commonly use HTTP requests to post data (create and/or update), read data (e.g., make queries), and delete data. Thus, REST can use HTTP for all four CRUD (Create/Read/Update/Delete) operations.
RPC
is basically used to communicate across the different modules to serve user requests.
e.g. In openstack like how nova, glance and neutron work together when booting a virtual machine.
The URL shared looks like RPC endpoint.
Below are examples for both RPC and REST. Hopefully this helps in understanding when they can be used.
Lets consider an endpoint that sends app maintenance outage emails to customers.
This endpoint preforms one specific action.
RPC
POST https://localhost:8080/sendOutageEmails
BODY: {"message": "we have a scheduled system downtime today at 1 AM"}
REST
POST https://localhost:8080/emails/outage
BODY: {"message": "we have a scheduled system downtime today at 1 AM"}
RPC endpoint is more suitable to use in this case. RPC endpoints usually are used when the API call is performing single task or action. We can obviously use REST as shown, but the endpoint is not very RESTful since we are not performing operations on resources.
Now lets look at an endpoint that stores some data in the database.(typical CRUD operation)
RPC
POST https://localhost:8080/saveBookDetails
BODY: {"id": "123", "name": "book1", "year": "2020"}
REST
POST https://localhost:8080/books
BODY: {"id": "123", "name": "book1", "year": "2020"}
REST is much better for cases like this(CRUD). Here, read(GET) or delete(DELETE) or update(PUT) can be done by using appropriate HTTP methods. Methods decide the operation on the resources(in this case 'books').
Using RPC here is not suitable as we need to have different paths for each CRUD operation(/getBookDetails, /deleteBookDetails, /updateBookDetails) and this has to be done for all resources in the application.
To summarize,
RPC can be used for endpoints that perform single specific action.
REST for endpoints where the resources need CRUD operations.
Slack uses this style of HTTP RPC Web API's - https://api.slack.com/web
There are bunch of good answers here. I would still refer you to this google blog as it does a really good job of discussing the differences between RPC & REST and captures something that I didn't read in any of the answers here.
I would quote a paragraph from the same link that stood out to me:
REST itself is a description of the design principles that underpin HTTP and the world-wide web. But because HTTP is the only commercially important REST API, we can mostly avoid discussing REST and just focus on HTTP. This substitution is useful because there is a lot of confusion and variability in what people think REST means in the context of APIs, but there is much greater clarity and agreement on what HTTP itself is. The HTTP model is the perfect inverse of the RPC model—in the RPC model, the addressable units are procedures, and the entities of the problem domain are hidden behind the procedures. In the HTTP model, the addressable units are the entities themselves and the behaviors of the system are hidden behind the entities as side-effects of creating, updating, or deleting them.
I would argue thusly:
Does my entity hold/own the data? Then RPC: here is a copy of some of my data, manipulate the data copy I send to you and return to me a copy of your result.
Does the called entity hold/own the data? Then REST: either (1) show me a copy of some of your data or (2) manipulate some of your data.
Ultimately it is about which "side" of the action owns/holds the data. And yes, you can use REST verbiage to talk to an RPC-based system, but you will still be doing RPC activity when doing so.
Example 1: I have an object that is communicating to a relational database store (or any other type of data store) via a DAO. Makes sense to use REST style for that interaction between my object and the data access object which can exist as an API. My entity does not own/hold the data, the relational database (or non-relational data store) does.
Example 2: I need to do a lot of complex math. I don't want to load a bunch of math methods into my object, I just want to pass some values to something else that can do all kinds of math, and get a result. Then RPC style makes sense, because the math object/entity will expose to my object a whole bunch of operations. Note that these methods might all be exposed as individual APIs and I might call any of them with GET. I can even claim this is RESTful because I am calling via HTTP GET but really under the covers it is RPC. My entity owns/holds the data, the remote entity is just performing manipulations on the copies of the data that I sent to it.
This is how I understand and use them in different use cases:
Example: Restaurant Management
use-case for REST: order management
- create order (POST), update order (PATCH), cancel order (DELETE), retrieve order (GET)
- endpoint: /order?orderId=123
For resource management, REST is clean. One endpoint with pre-defined actions. It can be seen a way to expose a DB (Sql or NoSql) or class instances to the world.
Implementation Example:
class order:
on_get(self, req, resp): doThis.
on_patch(self, req, resp): doThat.
Framework Example: Falcon for python.
use-case for RPC: operation management
- prepare ingredients: /operation/clean/kitchen
- cook the order: /operation/cook/123
- serve the order /operation/serve/123
For analytical, operational, non-responsive, non-representative, action-based jobs, RPC works better and it is very natural to think functional.
Implementation Example:
#route('/operation/cook/<orderId>')
def cook(orderId): doThis.
#route('/operation/serve/<orderId>')
def serve(orderId): doThat.
Framework Example: Flask for python
Over HTTP they both end up being just HttpRequest objects and they both expect a HttpResponse object back. I think one can continue coding with that description and worry about something else.

Updating a hit counter when an image is accessed in Django

I am working on doing some simple analytics on a Django webstite (v1.4.1). Seeing as this data will be gathered on pretty much every server request, I figured the right way to do this would be with a piece of custom middleware.
One important metric for the site is how often given images are accessed. Since each image is its own object, I thought about using django-hitcount, but figured that was unnecessary for what I was trying to do. If it proves easier, I may use it though.
The current conundrum I face is that I don't want to query the database and look for a given object for every HttpRequest that occurs. Instead, I would like to wait until a successful response (indicated by an HttpResponse.status of 200 or whatever), and then query the server and update a hit field for the corresponding image. The reason the only way to access the path of the image is in process_request, while the only way to access the status code is in process_response.
So, what do I do? Is it as simple as creating a class variable that can hold the path and then lookup the file once the response code of 200 is returned, or should I just use django-hitcount?
Thanks for your help
Set up a cron task to parse your Apache/Nginx/whatever access logs on a regular basis, perhaps with something like pylogsparser.
You could use memcache to store the counters and then periodically persist them to the database. There are risks that memcache will evict the value before it's been persisted but this could be acceptable to you.
This article provides more information and highlights a risk arising when using hosted memcache with keys distributed over multiple servers. http://bjk5.com/post/36567537399/dangers-of-using-memcache-counters-for-a-b-tests

Multiple Insert/Update business objects in REST

I have a requirement where I will receive multiple business objects from the client and my service has to insert/update all of them.
Can I implement a REST webservice which will have a POST method and will accept a list of business objects and will update/insert all of them into the system? I have read that we should use a POST method to create a new entry. Can we use POST method for this kind of scenario wherein we can create/update multiple entries at one go?
My other query is, for a POST method, is it RESTful to return a business object instead of returning a RESPONSE object?
REST is about scalability; scalability is about cachability; cachability is about individual resources, not sets of them. A post probably shouldn't return anything other than a possible redirect to a GET that returns the resource just posted. Data should be fetched with a GET, GET's are cachable. POST, PUT, DELETE are actions, not queries, you don't get data with them other than what they may include to point you at some new resource via response headers.
Yes you can use POST to accept a document that will cause the creation of a list of business objects. It is not the most obvious way to do it, but it can be done RESTfully. See my answer to your other question.
A POST can return a document that represents information about a business object. It can't really return a business object directly, because HTTP doesn't return objects, it returns streams of bytes that can be interpreted using the content-type header.