Dealing with a web api failure

Dealing with a web api failure - web-services

If I have a web api service (Order Notification) that allows a third party client to call in (they must call in to us, not use pushing to them) periodically (every 10 minutes) and gets new orders it has not yet received, how do I deal with failures?
For example there are 10 new Orders the client has not received since they last called in. The client calls into our Order Notification service. We retrieve the orders we have not sent (10 in this case). We update these 10 Orders as sent and return the response to the client.
However the client did not receive the response (sometime happened after leaving us e.g. http time out or something else).
So now we have a problem where on our side we have marked the orders as sent but the client never received them.
Any thoughts on how to solve this?

Just an idea, can you assign the caller some sort of identifier and when the caller succeeds it replies back saying it has acknowledged the request? The server will never know if something failed on the client side unless the client reports it.
For example, when caller A calls in for the requests it may do something like this:
call -> http://server/requests
server replies back with some xml that contains the result set for this caller along with a unique identifier that it will track to know if that particular call had a response (you can time out this identifier after a reasonable period of time)
when the client gets the request it can call back again
call -> http://server/requestComplete?id=[generatedID]
and the server marks it successful.
Lots of API's require some sort of identification token so it would already lend itself well to this kind of send/ack messaging system.

If you have access to both sides of the system you could create a received request so once the client picking up the data has received it makes a request to the original host telling that it's received successfully.

Related

Handle side effects caused by duplicate POST requests

Let's say we have a web service that creates and updates meeting room bookings. Updates can change various aspects of a booking, such as time and room number.
Let's imagine that user's network connection to the service may not be reliable (e.g. mobile network), and two users A and B try to update the same booking sequentially.
User A sends a POST request to change the meeting time to 2pm, the request reaches the server and server processed the request successfully. However, the response back to User A gets lost due to network connection, and User A thinks the request fails.
Before User A tries again, User B sends her request to change the meeting time to 2:30pm, and it succeeds and responds to User B successfully.
Now User A retries (perhaps automatically) the same request again, and this time both the request and response succeed without a problem. In other words, the meeting time is changed back to 2pm.
In the hypothetical scenario above, User A's duplicated requests cause User B's request be overwritten, and result an incorrect state on the server-side.
One possible but naive solution is to set a ID for each every request on the client, and this ID does not change if a request is simply re-tried/re-sent. Then on the server-side, the server maintains a collection of received request IDs and checks for duplicates.
What are the better techniques or methods for solving this problem?

This a common problem with concurrent users. One way to solve it is to enforce conditional requests, requiring clients to send the If-Unmodified-Since header with the Last-Modified value for the resource they are attempting to change. That guarantees nobody else changed it between the last time they checked and now. In your case, this would prevent A from overwritten B's changes.
For instance, user A wants to change the meeting time. It sends a GET request for the meeting resource and keep the value of the Last-Modified response header. Then, it sends the POST request with the Last-Modified value in the If-Unmodified-Since header. Following your example, this request actually succeeds, but the response is lost.
If A repeats the request immediately, it will fail with 412 Precondition Failed, since the condition is no longer valid.
If in the meantime B does the same thing and changes the meeting time again, when A tries to repeat the request, without checking for the current Last-Modified value corresponding to B's changes, it also fails with 412 Precondition Failed.

Should idempotent service respond with error after first call?

I'm writing service layer for DDD app.
Services are exposed through JSON-RPC over WSS.
I'm not sure how to respond to redundant calls to service.
Some facts about the system:
All requests must be completed within specific time or timeout exception occurs.
If system is under heavy load it may decide to discard request (visible as timeout).
If system is under heavy load some messages may expire in the queue (visible as timeout).
Even if request reaches it's destination ACK may not reach user in
time (visible as timeout).
End user has right to re-invoke method if ACK didn't arrive in time.
No guarantees on request completion are given.
Thus the need for idempotency.
Problem arises if we consider [4]+[5] implications:
User invokes method setFoo(Bar).
Entity was created but ACK didn't make it on time.
User receives timeout and assumes that he should try again, so he re-invokes setFoo(Bar).
Entity already exists -> hmm...
Question is: Should user get ACK or Error(I've already done that mate...)?

An idempotent operation should have the same behaviour when it is called multiple times. This suggests that the return value should be the same as well, so in the scenario you are describing above, the user should get ACK.
Consider the alternative; if you return an error to the user, then how should the user respond? What "error handling" is appropriate?
You can make an argument for a response of ACK(I've already done that mate...) but the part in brackets should be a purely optional informative field, not something that affects how the user processes the response.

Webservice with always in memory object with queue

I have a function to give recommendations to users. This function need to make a lot of calcs to start, but after start it use the already calculed matrix on memory. After this, any other calc that is made, "fills" the object in memory to continuous learning.
My intention is to use this function to website users, but the response need to come from the same "object" in memory and need to be sequential by request because it is not thread safe.
How is the best way to get this working? My first idea was use signalr so the user dont need to wait to response and a queue to send the requests to objects. But how the signalr can receive the response for this specific request?
The entire flow is:
User enter on a page.
A javascript will call a service with the user ID and actual page.
The server will queue the ID an page.
The service will be calculating the results for each request on queue and sending responses.
The server will "receive" the response and send back to client.
The main problem is that I dont see a way to the service receive the response to send back to client until it is complete, without need to be looping in queues.
Thanks!

If you are going to use SignalR, I would suggest using a hub method to accept these potentially long running requests from the client. By doing so it should be obvious "how the signalr can receive the response for this specific request".
You should be able to queue your calculations from inside your hub method where you will have access to the caller's connection id (via the Context.ConnectionId property).
If you can await the results of your queued operation inside of the hub method you queue from, you can then simply return the result from your hub method and SignalR will flow the result back to the calling JavaScript. You can also use Clients.Caller.... to send the result back.
If you go this route I suggest you use async/await instead of blocking request threads waiting for your long-running calculations to complete.
http://www.asp.net/signalr/overview/signalr-20/hubs-api/hubs-api-guide-server
If you can't process your calculation results from the same method you queued the calculation from, you still have options. Just be sure to queue the caller's connection id and a request id along with the calculation to be processed.
Then, you can process the results of all your calculations from outside of your hub using GlobalHost.ConnectionManager.GetHubContext:
private IHubContext _context = GlobalHost.ConnectionManager.GetHubContext<MyHub>()
// Call ProcessResults whenever results are ready to send back to the client
public void ProcessResults(string connectionId, uint requestId, MyResult result)
{
// Presumably there's JS code mapping request id's to results
// if you can have multiple ongoing requests per client
_context.Clients.Client(connectionId).receiveResult(requestId, result);
}
http://www.asp.net/signalr/overview/signalr-20/hubs-api/hubs-api-guide-server#callfromoutsidehub

Architecture for robust payment processing

Imagine 3 system components:
1. External ecommerce web service to process credit card transactions
2. Local Database to store processing results
3. Local UI (or win service) to perform payment processing of the customer order document
The external web service is obviously not transactional, so how to guarantee:
1. results to be eventually persisted to database when received from web service even in case the database is not accessible at that moment(network issue, db timeout)
2. prevent clients from processing the customer order while payment initiated by other client but results not successfully persisted to database yet(and waiting in some kind of recovery queue)
The aim is to do processing having non transactional system components and guarantee the transaction won't be repeated by other process in case of failure.
(please look at it in the context of post sell payment processing, where multiple operators might attempt manual payment processing; not web checkout application)

Ask the payment processor whether they can detect duplicate transactions based on an order ID you supply. Then if you are unable to store the response due to a database failure, you can safely resubmit the request without fear of double-charging (at least one PSP I've used returned the same response/auth code in this scenario, along with a flag to say that this was a duplicate).
Alternatively, just set a flag on your order immediately before attempting payment, and don't attempt payment if the flag was already set. If an error then occurs during payment, you can investigate and fix the data at your leisure.
I'd be reluctant to go down the route of trying to automatically cancel the order and resubmitting, as this just gets confusing (e.g. what if cancelling fails - should you retry or not?). Best to keep the logic simple so when something goes wrong you know exactly where you stand.

In any system like this, you need robust error handling and error reporting. This is doubly true when it comes to dealing with payments, where you absolutely do not want to accidentaly take someone's money and not deliver the goods.
Because you're outsourcing your payment handling to a 3rd party, you're ultimately very reliant on the gateway having robust error handling and reporting systems.
In general then, you hand off control to the payment gateway and start a task that waits for a response from the gateway, which is either 'payment accepted' or 'payment declined'. When you get that response you move onto the next step in your process and everything is good.
When you don't get a response at all (time out), or the response is invalid, then how you proceed very much depends on the payment gateway:
If the gateway supports it send a 'cancel payment' style request. If the payment cancels successfully then you probably want to send the user to a 'sorry, please try again' style page.
If the gateway doesn't support canceling, or you have no communications to the gateway then you will need to manually (in person, such as telephone) contact the 3rd party to discover what went wrong and how to proceed. To aid this you need to dump as much detail as you have to error logs, such as date/time, customer id, transaction value, product ids etc.
Once you're back on your site (and payment is accepted) then you're much more in control of errors, but in brief if you cant complete the order, then you should either dump the details to disk (such as csv file for manual handling) or contact the gateway to cancel the payment.
Its also worth having a system in place to track errors as they occur, and if an excessive number occur then consider what should happen. If its a high traffic site for example you may want to temporarily prevent further customers from placing orders whilst the issue is investigated.

Distributed messaging.
When your payment gateway returns submit a message to a durable queue that guarantees a handler will eventually get it and process it. The handler would update the database. Should failure occur at that point the handler can leave the message in the queue or repost it to the queue, or post an alternate message.
Should something occur later that invalidates the transaction, another message could be queued to "undo" the change.
There's a fair amount of buzz lately about eventual consistency and distribute messaging. NServiceBus is the new component hotness. I suggest looking into this, I know we are.

Web Services design

Company A has async pooling based webservice for notifications. Company B checks for notifications. Every time when it reads new notifications A deletes them from the system. Thus subsequent read requests return only new notifications. There is also requirement for the client B to interrupt the connection if there is no response within 30 sec.
This causes one potential problem: Due to unexpected slowness it is possible for A get the request deleted a notification and send the response back while B is already interrupted the connection. Under this scenario notification gets lost. Now one can argue that the core problem lies within operation realm (the HTTP response must be delivered withing 20 sec ) still on practice it is not always feasible.
How to design B (the client) to avoid this problem?
One way I can see is to do not delete the notifications by A and make B be aware of its state, so that it knows starting from what ID it needs to process notifications, but that presumes that ID will be sequential. Which is controlled by A. Even if B defines its own sequence A still has to be altered to return it back.
Are there any other approaches?
Thanks!

Web services in general are unreliable enough that it's rarely a good idea to make a "read" request serve double-duty as a "delete" request, especially without the client's knowledge. There is just too much risk of a connection dropping or timing out. There is no way to get around this only by modifying the client, because it's the server that is at fault here - the way it's designed is fundamentally unsuited for a web service.
I think you're on the right track with the incrementing IDs idea. The client knows (or can be modified to know) which notifications it's received, so if it can supply the ID of the last message it's received when it polls for notifications, the server should be able to respond based on that ID.

It really seems like Company A's webservice should be synchronous instead of asynchronous. If that is not possible, it may be a good idea to send a "ACK"-like response to a new Company A webservice that indicates a specific notification was received (by Company B) and can be deleted.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js