Handle side effects caused by duplicate POST requests - web-services

Let's say we have a web service that creates and updates meeting room bookings. Updates can change various aspects of a booking, such as time and room number.
Let's imagine that user's network connection to the service may not be reliable (e.g. mobile network), and two users A and B try to update the same booking sequentially.
User A sends a POST request to change the meeting time to 2pm, the request reaches the server and server processed the request successfully. However, the response back to User A gets lost due to network connection, and User A thinks the request fails.
Before User A tries again, User B sends her request to change the meeting time to 2:30pm, and it succeeds and responds to User B successfully.
Now User A retries (perhaps automatically) the same request again, and this time both the request and response succeed without a problem. In other words, the meeting time is changed back to 2pm.
In the hypothetical scenario above, User A's duplicated requests cause User B's request be overwritten, and result an incorrect state on the server-side.
One possible but naive solution is to set a ID for each every request on the client, and this ID does not change if a request is simply re-tried/re-sent. Then on the server-side, the server maintains a collection of received request IDs and checks for duplicates.
What are the better techniques or methods for solving this problem?

This a common problem with concurrent users. One way to solve it is to enforce conditional requests, requiring clients to send the If-Unmodified-Since header with the Last-Modified value for the resource they are attempting to change. That guarantees nobody else changed it between the last time they checked and now. In your case, this would prevent A from overwritten B's changes.
For instance, user A wants to change the meeting time. It sends a GET request for the meeting resource and keep the value of the Last-Modified response header. Then, it sends the POST request with the Last-Modified value in the If-Unmodified-Since header. Following your example, this request actually succeeds, but the response is lost.
If A repeats the request immediately, it will fail with 412 Precondition Failed, since the condition is no longer valid.
If in the meantime B does the same thing and changes the meeting time again, when A tries to repeat the request, without checking for the current Last-Modified value corresponding to B's changes, it also fails with 412 Precondition Failed.

Related

Request data seemingly dirty in multithreaded flask app

We are seeing a random error that seems to be caused by two requests' data getting mixed up. We receive a request for quoting shipping costs on an Order, but the request fails because the requested Order is not accessible by the requesting account. I'm looking for anyone who can provide an inkling on what might be happening here, I haven't found anything on google, the official flask help channels, or SO that looks like what we're experiencing.
We're deployed on AWS, with apache, mod_wsgi, 1 process, 15 threads, about 10 instances.
Here's the code that sends the email:
msg = f"Order ID {self.shipping.order.id} is not valid for this Account {self.user.account_id}"
body = f"Error:<br/>{msg}<br/>Request Data:<br/>{request.data}<br/>Headers:<br/>{request.headers}"
send_email(msg, body, "devops#*******.com")
request_data = None
The problem is that in that scenario we email ourselves with the error and the request data, and the request data we're getting, in many cases, would've never landed in that particular piece of code. It can be a request from the frontend to get the current user's settings, for example, that make no reference to any orders, nevermind trying to get a shipping quote for it.
Comparing the application logs with apache's access_log, we see that, in all cases, we got two requests on the same instance, one requesting the quoting, and another which is the request that is actually getting logged. We don't know whether these two requests are processed by the same thread in rapid succession, or by different threads, but they come so close together that I think the latter is much more probable. We have no way of univocally tying the access_log entries with the application logging, so far, so we don't know which one of the requests is logging the error, but the fact is that we're getting routed to a view that does not correspond to the request's content (i.e., we're not sure whether the quoting request is getting the wrong request object, or if the other one is getting routed to the wrong view).
Another fact that is of interest is that we use graphql, so part of the routing is done after flask/werkzeug do theirs, but the body we get from flask.request at the moment the error shows up does not correspond with the graphql function/mutation that gets executed. But this also happens in views mapped directly through flask. The user is looked up by the flask-login workflow at the very beginning, and it corresponds to the "bad" request (i.e., the one not for quoting).
The actual issue was a bug on one of python-graphql's libraries (promise), not on Flask, werkzeug or apache. It was not the request data that was "moving" to a different thread, but a different thread trying to resolve the promise for a query that was supposed to be handled elsewhere.

Should idempotent service respond with error after first call?

I'm writing service layer for DDD app.
Services are exposed through JSON-RPC over WSS.
I'm not sure how to respond to redundant calls to service.
Some facts about the system:
All requests must be completed within specific time or timeout exception occurs.
If system is under heavy load it may decide to discard request (visible as timeout).
If system is under heavy load some messages may expire in the queue (visible as timeout).
Even if request reaches it's destination ACK may not reach user in
time (visible as timeout).
End user has right to re-invoke method if ACK didn't arrive in time.
No guarantees on request completion are given.
Thus the need for idempotency.
Problem arises if we consider [4]+[5] implications:
User invokes method setFoo(Bar).
Entity was created but ACK didn't make it on time.
User receives timeout and assumes that he should try again, so he re-invokes setFoo(Bar).
Entity already exists -> hmm...
Question is: Should user get ACK or Error(I've already done that mate...)?
An idempotent operation should have the same behaviour when it is called multiple times. This suggests that the return value should be the same as well, so in the scenario you are describing above, the user should get ACK.
Consider the alternative; if you return an error to the user, then how should the user respond? What "error handling" is appropriate?
You can make an argument for a response of ACK(I've already done that mate...) but the part in brackets should be a purely optional informative field, not something that affects how the user processes the response.

Dealing with a web api failure

If I have a web api service (Order Notification) that allows a third party client to call in (they must call in to us, not use pushing to them) periodically (every 10 minutes) and gets new orders it has not yet received, how do I deal with failures?
For example there are 10 new Orders the client has not received since they last called in. The client calls into our Order Notification service. We retrieve the orders we have not sent (10 in this case). We update these 10 Orders as sent and return the response to the client.
However the client did not receive the response (sometime happened after leaving us e.g. http time out or something else).
So now we have a problem where on our side we have marked the orders as sent but the client never received them.
Any thoughts on how to solve this?
Just an idea, can you assign the caller some sort of identifier and when the caller succeeds it replies back saying it has acknowledged the request? The server will never know if something failed on the client side unless the client reports it.
For example, when caller A calls in for the requests it may do something like this:
call -> http://server/requests
server replies back with some xml that contains the result set for this caller along with a unique identifier that it will track to know if that particular call had a response (you can time out this identifier after a reasonable period of time)
when the client gets the request it can call back again
call -> http://server/requestComplete?id=[generatedID]
and the server marks it successful.
Lots of API's require some sort of identification token so it would already lend itself well to this kind of send/ack messaging system.
If you have access to both sides of the system you could create a received request so once the client picking up the data has received it makes a request to the original host telling that it's received successfully.

HTTP Request signature without session

I am thinking of a rest web service that ensure for every request sent to him that :
The request was generated by the user who claim it ;
The request has not been modified by someone else (uri/method/content/date);
For GET requests, it should be possible to generate a URI with enough information in it to check the signature and set a date of expiration. That way a user can delegate temporary READ permissions to a collaborator for a limited time period on a ressource with a generated URI.
Clients are authenticated with id and a content-signature based on their password.
There should be no session at all, and so server state ! The server and the client share a secret key (a password)
After thinking about it and talking with some really nice folks, it seems there is no rest service existing to do that as simple as it should be for my use case. (HTTP Digest and OAuth can do this with server state and are very chatty)
So I Imagined one, and I'm asking your greats comments on how it should be designed (I will release it OpenSource and Hope it can help others).
The service use a custom "Content-signature" header to store credentials. An authenticated request should contains this header :
Content-signature: <METHOD>-<USERID>-<SIGNATURE>
<METHOD> is the sign method used, in our case SRAS.
<USERID> stands for the user ID mentioned earlier.
<SIGNATURE> = SHA2(SHA2(<PASSWORD>):SHA2(<REQUEST_HASH>));
<REQUEST_HASH> = <HTTP_METHOD>\n
<HTTP_URI>\n
<REQUEST_DATE>\n
<BODY_CONTENT>;
A request is invalidated 10 minutes after it has been created.
For example a typical HTTP REQUEST would be :
POST /ressource HTTP/1.1
Host: www.elphia.fr
Date: Sun, 06 Nov 1994 08:49:37 GMT
Content-signature: SRAS-62ABCD651FD52614BC42FD-760FA9826BC654BC42FD
{ test: "yes" }
The server will answer :
401 Unauthorized
OR
200 OK
Variables would be :
<USERID> = 62ABCD651FD52614BC42FD
<REQUEST_HASH> = POST\n
/ressource\n
Sun, 06 Nov 1994 08:49:37 GMT\n
{ test: "yes" }\n
URI Parameters
Some parameters can be added to the URI (they overload the headers informations) :
_sras.content-signature=<METHOD>-<USERID>-<SIGNATURE> : PUT the credentials in the URI, not in the HTTP header. This allow a user to share a signed request ;
_sras.date=Sun, 06 Nov 1994 08:49:37 GMT (request date*) : The date when the request was created.
_sras.expires=Sun, 06 Nov 1994 08:49:37 GMT (expire date*) : Tell the server the request should not expire before the specified date
*date format : http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.18
Thanks for your comments.
There are several issues that you need to consider when designing a signature protocol. Some of these issues might not apply to your particular service:
1- It is customary to add an "X-Namespace-" prefix to non-standard headers, in your case you could name your header something like: "X-SRAS-Content-Signature".
2- The Date header might not provide enough resolution for the nonce value, I would therefore advise for a timestamp having at least 1 millisecond of resolution.
3- If you do not store at least the last nonce, one could still replay a message in the 10 minutes window, which is probably unacceptable on a POST request (could create multiple instances with same values in your REST web service). This should not be a problem for GET PUT or DELETE verbs.
However, on a PUT, this could be used for a denial of service attack by forcing to update many times the same object within the proposed 10 minutes window. On a GET or DELETE a similar problem exists.
You therefore probably need to store at least the last used nonce associated with each user id and share this state between all your authentication servers in real-time.
4- This method also requires that the client and servers be clock synchronized with less than 10 minutes skew. This can be tricky to debug, or impossible to enforce if you have AJAX clients for which you do not control the clock. This also requires to set all timestamps in UTC.
An alternative is to drop the 10 minutes window requirement but verify that timestamps increase monotonically, which again requires to store the last nonce. This is still a problem if the client's clock is updated to a date prior to the last used nonce. Access would be denied until the client's clock pass the last nonce or the server nonce state is reset.
A monotonically increasing counter is not an option for clients that cannot store a state, unless the client could request the last used nonce to the server. This would be done once at the beginning of each session and then the counter would be incremented at each request.
5- You also need to pay attention to retransmissions due to networks errors. You cannot assume that the server has not received the last message for which a TCP Ack has not been received by the client before the TCP connection dropped. Therefore the nonce needs to be incremented between each retransmission above the TCP level and the signature re-calculated with the new nonce. Yet a message number needs to be added to prevent double execution on the server: a double POST would result in 2 object being created.
6- You also need to sign the userid, otherwise, an attacker might be able to replay the same message for all users which nonces have not yet reached that of the replayed message.
7- Your method does not guaranty the client that the server is authentic and has not been DNS-hijacked. Server authentication is usually considered important for secure communications. This service could be provided by signing responses from the server, using the same nonce as that of the request.
I would note that you can accomplish this with OAuth, most notably "2-legged OAuth" where client and server share a secret. See https://www.rfc-editor.org/rfc/rfc5849#page-14. In your case, you want to omit the oauth_token parameter and probably use the HMAC-SHA1 signature method. There's nothing particularly chatty about this; you don't need to go through the OAuth token acquisition flows to do things this way. This has the advantage of being able to use any of several existing open source OAuth libraries.
As far as server-side state, you do need to keep track of what secrets go with which clients, as well as which nonces have been used recently (to prevent replay attacks). You can skip the nonce checking / lifetimes if you run things over HTTPS, but if you're going to do that, then HTTPS + Basic Auth gives you everything you described without having to write new software.

Web Services design

Company A has async pooling based webservice for notifications. Company B checks for notifications. Every time when it reads new notifications A deletes them from the system. Thus subsequent read requests return only new notifications. There is also requirement for the client B to interrupt the connection if there is no response within 30 sec.
This causes one potential problem: Due to unexpected slowness it is possible for A get the request deleted a notification and send the response back while B is already interrupted the connection. Under this scenario notification gets lost. Now one can argue that the core problem lies within operation realm (the HTTP response must be delivered withing 20 sec ) still on practice it is not always feasible.
How to design B (the client) to avoid this problem?
One way I can see is to do not delete the notifications by A and make B be aware of its state, so that it knows starting from what ID it needs to process notifications, but that presumes that ID will be sequential. Which is controlled by A. Even if B defines its own sequence A still has to be altered to return it back.
Are there any other approaches?
Thanks!
Web services in general are unreliable enough that it's rarely a good idea to make a "read" request serve double-duty as a "delete" request, especially without the client's knowledge. There is just too much risk of a connection dropping or timing out. There is no way to get around this only by modifying the client, because it's the server that is at fault here - the way it's designed is fundamentally unsuited for a web service.
I think you're on the right track with the incrementing IDs idea. The client knows (or can be modified to know) which notifications it's received, so if it can supply the ID of the last message it's received when it polls for notifications, the server should be able to respond based on that ID.
It really seems like Company A's webservice should be synchronous instead of asynchronous. If that is not possible, it may be a good idea to send a "ACK"-like response to a new Company A webservice that indicates a specific notification was received (by Company B) and can be deleted.