Architecture for robust payment processing

Architecture for robust payment processing - web-services

Imagine 3 system components:
1. External ecommerce web service to process credit card transactions
2. Local Database to store processing results
3. Local UI (or win service) to perform payment processing of the customer order document
The external web service is obviously not transactional, so how to guarantee:
1. results to be eventually persisted to database when received from web service even in case the database is not accessible at that moment(network issue, db timeout)
2. prevent clients from processing the customer order while payment initiated by other client but results not successfully persisted to database yet(and waiting in some kind of recovery queue)
The aim is to do processing having non transactional system components and guarantee the transaction won't be repeated by other process in case of failure.
(please look at it in the context of post sell payment processing, where multiple operators might attempt manual payment processing; not web checkout application)

Ask the payment processor whether they can detect duplicate transactions based on an order ID you supply. Then if you are unable to store the response due to a database failure, you can safely resubmit the request without fear of double-charging (at least one PSP I've used returned the same response/auth code in this scenario, along with a flag to say that this was a duplicate).
Alternatively, just set a flag on your order immediately before attempting payment, and don't attempt payment if the flag was already set. If an error then occurs during payment, you can investigate and fix the data at your leisure.
I'd be reluctant to go down the route of trying to automatically cancel the order and resubmitting, as this just gets confusing (e.g. what if cancelling fails - should you retry or not?). Best to keep the logic simple so when something goes wrong you know exactly where you stand.

In any system like this, you need robust error handling and error reporting. This is doubly true when it comes to dealing with payments, where you absolutely do not want to accidentaly take someone's money and not deliver the goods.
Because you're outsourcing your payment handling to a 3rd party, you're ultimately very reliant on the gateway having robust error handling and reporting systems.
In general then, you hand off control to the payment gateway and start a task that waits for a response from the gateway, which is either 'payment accepted' or 'payment declined'. When you get that response you move onto the next step in your process and everything is good.
When you don't get a response at all (time out), or the response is invalid, then how you proceed very much depends on the payment gateway:
If the gateway supports it send a 'cancel payment' style request. If the payment cancels successfully then you probably want to send the user to a 'sorry, please try again' style page.
If the gateway doesn't support canceling, or you have no communications to the gateway then you will need to manually (in person, such as telephone) contact the 3rd party to discover what went wrong and how to proceed. To aid this you need to dump as much detail as you have to error logs, such as date/time, customer id, transaction value, product ids etc.
Once you're back on your site (and payment is accepted) then you're much more in control of errors, but in brief if you cant complete the order, then you should either dump the details to disk (such as csv file for manual handling) or contact the gateway to cancel the payment.
Its also worth having a system in place to track errors as they occur, and if an excessive number occur then consider what should happen. If its a high traffic site for example you may want to temporarily prevent further customers from placing orders whilst the issue is investigated.

Distributed messaging.
When your payment gateway returns submit a message to a durable queue that guarantees a handler will eventually get it and process it. The handler would update the database. Should failure occur at that point the handler can leave the message in the queue or repost it to the queue, or post an alternate message.
Should something occur later that invalidates the transaction, another message could be queued to "undo" the change.
There's a fair amount of buzz lately about eventual consistency and distribute messaging. NServiceBus is the new component hotness. I suggest looking into this, I know we are.

Related

Applied Eventually Consistency and Race Conditions

I have a question regarding the effect of eventually consistent (EC) microservice systems.
Imagine we have a booking system - a user-service A and booking-service B. Each service has its own database. Imagine the system does a concurrent booking of the same resource for distinct users at the same time. Lets assume we have a Runtime Verification System checking the concurrent booking.
Would it be possible that the monitor does not realize the concurrent booking at B, because the update in the database is done delayed because of the EC mechanism?

In your example, the Booking Service is the source of truth (presumably) for whether or not the resource is available to book. So, that service should be pretty clear on allowing the first booking request to happen and rejecting the second.
In a case like this, where "first come first served" is the requirement, you'd want an intermediate state that would wait for a response from the Booking Service and update the User Service only when a response has been received.
If your architecture is set up right, User Service shouldn't be calling Booking Service directly anyway - it should be communicating through a messaging plane. As such, when the User clicks "Book Now," you could generate a resourceBookingRequested message and submit it to the queue. You'd acknowledge this request has been queued to the user and update their UI to "Awaiting Booking Confirmation..." or something similar.
Once the booking is accepted, or rejected, the User Service subscribes to the resulting message and updates the UI (and/or takes other actions like sending an email) to let the user know their request succeeded or didn't.

Applying CQRS to charging credit Card (using AKKA)

Given that I am a bit confused with CQRS I would like to understand it further in the following scenario.
I have an Actor that charge Users' credit card. To do so it contact a bank external service that does the operation, get a confirmation result. I would like to know how can I apply this with CQRS.
The information that needs to be written here is that a specific user has been charge a certain amount. So the event generated is Charged (UserID, Card, Amount). Something like that.
The problem is that all the examples I have seen especially with AKKA, would only generate the event after a Command is validated, such that it is persisted in a journal, and used to update the state of the actor. The Journal could then be red on the other side, such that to create a Reading view here.
Also usually, in those examples, the update state function has a logic that somewhat execute the command, because the command correspond straightforwardly to a state update at the end of the day. This is the typical BasketShoping example: CreateOrder, AddLineItem. All Of this Command, are directly translated in Event, that correspond to a specific code of the Update state function.
However in this example, one needs to actually contact an external service, charge the user and then generate an event. Contacting the external service can't be done in the update state, or after reading the journal. It would not make sense.
How is that done, and where, and when exactly, in the spirit of CQRS?

I can think of 2 ways of doing this.
First is a simple way. The command is DoCharge(UserId, Card, Amount). Upon reception of this command, you call the external payment service. If this has been successfully completed, you generate an event, Charged(UserId, Card, Amount, TransactionId) and store it in the journal.
Now, of course, it's not completely safe way, because your Actor can crash after it has sent the request to payment service, but before it has received and persisted the confirmation of the successful completion. Then you risk of charging the user twice. To overcome this risk, you have to make your payment operation idempotent. Here's how to do it. This example is based on the classic "RESTify Day trader" article. I'll summarize it here.
You need to split the payment operation in 2 phases. In first one, payment service creates a transaction token. It just identifies the transaction, and no financial operations are performed yet. Upon the creation, the identifier is received by your service and persisted in the journal.
In next phase you perform a payment associated with the identifier from phase one. If your actor now fails in the middle, while operation is performed successfully on the payment service side, the transaction token will already be marked as processed by the payment service, and it won't let you charge the customer twice. Now, if you restart the failed Actor, and it tries to run the payment associated with the existing transaction token, the payment service should return result like "Already executed" or such. Of course, at the end you also persist the result of the operation in the journal.

The rate of control plane requests made by this account is too high

I'm using AWS Dynamo DB and it keeps giving me the following error when trying to create DB by https://www.npmjs.org/package/dynamodb:
The rate of control plane requests made by this account is too high
Does anyone know what the reason is?
Thanks

Could you share your code that is calling the create? And does this happen every time, or only sometimes? If you can get insight into whether the CreateTable API call is failing, or a DescribeTable API call is failing, that would be helpful too. If you can log the request ids of all of the requests you're making, and share them on this post, we (the DynamoDB folks) can see if we can get more details on our side.
This error may occur when you create, update, or delete many tables simultaneously (as in call the API with many operations simultaneously). This is easy to do in Node.js because of its non-blocking programming model. The error may also happen if you CreateTable and then immediately call DescribeTable simultaneously or immediately after (this typically doesn't happen though).

How to handle an online payment handled using Web Services safely?

Disclaimer: I only have a very basic understanding of how Web Services work and don't know much about advanced WS topics such transactions, etc.
Let's pretend that I am developing an online store using Java EE, JPA, etc. Also let's pretend that I have a contract with an online payment processing provider to handle payments and they have provided me with a WS API.
Now let's pretend that a customer has placed an order. In a session bean (e.g. inside OrderSB.placeOrder) I have opened a transaction, saved an Order in the DB, and now I am making a call to the payment provider's WS API. It returns successfully (and I assume that by now my customer's account has been debited) but before I can save the Order's associated Payment (there's one-to-one relationship between Order and Payment) an exception occurs and my transaction is rolled back.
How is it possible to ensure that when such an exception happens, my customer's account is not debited? Or in other words either both of the WS call and OrderSB.placeOrder should complete successfully and commit or both of them should be rolled back together.
It's easy to roll back placeOrder if the WS call fails, but I don't know how I can roll back the WS call after it returns.

Why don't you complete the placeOrder flow and do the WS call just if the first finished with success? Then as you say that is easy to rollback placeOrder, if errors appear in the second just rollback the first one. Or am I not understanding your question right?

Web Services design

Company A has async pooling based webservice for notifications. Company B checks for notifications. Every time when it reads new notifications A deletes them from the system. Thus subsequent read requests return only new notifications. There is also requirement for the client B to interrupt the connection if there is no response within 30 sec.
This causes one potential problem: Due to unexpected slowness it is possible for A get the request deleted a notification and send the response back while B is already interrupted the connection. Under this scenario notification gets lost. Now one can argue that the core problem lies within operation realm (the HTTP response must be delivered withing 20 sec ) still on practice it is not always feasible.
How to design B (the client) to avoid this problem?
One way I can see is to do not delete the notifications by A and make B be aware of its state, so that it knows starting from what ID it needs to process notifications, but that presumes that ID will be sequential. Which is controlled by A. Even if B defines its own sequence A still has to be altered to return it back.
Are there any other approaches?
Thanks!

Web services in general are unreliable enough that it's rarely a good idea to make a "read" request serve double-duty as a "delete" request, especially without the client's knowledge. There is just too much risk of a connection dropping or timing out. There is no way to get around this only by modifying the client, because it's the server that is at fault here - the way it's designed is fundamentally unsuited for a web service.
I think you're on the right track with the incrementing IDs idea. The client knows (or can be modified to know) which notifications it's received, so if it can supply the ID of the last message it's received when it polls for notifications, the server should be able to respond based on that ID.

It really seems like Company A's webservice should be synchronous instead of asynchronous. If that is not possible, it may be a good idea to send a "ACK"-like response to a new Company A webservice that indicates a specific notification was received (by Company B) and can be deleted.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js