How to keep system running while replacing the current engine

How to keep system running while replacing the current engine - concurrency

We have two systems, A and B. System B sends Write and Read request as well as A returns a response for every read request using the existing engine E_current in A. Each Write request causes a modification in the existing engine E_current.
Periodically E_current will be replaced by E_new. While in renewal process, E_new is not able to used yet. Some of the Read request that comes while this renewal process depends on Write request that came after beginning of the renewal process. The new engine, E_new, should also do modifications on itself for each Write request that came during the renewal process and already processed by .
After completion of renewal process E_current will be evicted and E_new becomes E_current.
Requirements:
Requests are completely concurrent. For example, a write request can
come while a read request is being processed.
Multiple modifications on any engine E could cause inconsistent state, state consistency should be preserved.
Diagrams:
https://dl.dropbox.com/u/3482709/stack1.jpg
https://dl.dropbox.com/u/3482709/stack2.jpg

Related

How to effectively stay in-sync when using dfuse streaming API

I'm using dfuse streaming API to built a EOS based application.
I want to keep an in-sync view of all the data flowing through my smart contract.
I want to ensure that I will always be in sync so I never miss a single block. Using the dfuse streaming API, how can I achieve that?

Using the with_progress feature of the Websocket API, you will receive one progress message after each block was processed.
For a given stream (corresponding to a request and an associated req_id), when you received a progress for a block, you are guaranteed to have seen all the contents it contained in your query (the actions for get_action_traces, or the rows for get_table_rows).
If you keep track of that block_num and/or block_id, upon disconnection/reconnections, you can provide it back in your request, and be guaranteed never to miss a beat, even if that means you're reprocessing 1M blocks.
As of November 22nd 2018, the get_table_rows request will stream table_delta messages that include an "undo"/"redo" step. This allows your app to navigate forks and ensure absolute sync of your application with the longest chain's state.
Check the docs at https://docs.dfuse.io/ and search for with_progress for more details.

Camel route POSTs to service that takes 20+ minutes to respond

I have an Apache Camel (version 2.15.3) route that is configured as follows (using a mix of XML and Java DSL):
Read a file from one of several folders on an FTP site.
Set a header to indicate which folder it was read from.
Do some processing and auditing.
Synchronously POST to an external REST service (jax-rs 1.1, Glassfish, Java EE 6).
The REST service takes a long time to do its job, 20+ minutes.
Receive the reply.
Do some more processing and auditing.
Write the response to one of several folders on an FTP site.
Use the header set at the start to know which folder to write to.
This is all configured in a single path of chained routes.
The problem is that the connection to the external REST service will timeout while the service is still processing. The infrastructure is a bit complex (edge servers, load balancers, Glassfish), and regardless I don't think increasing the timeout is the right solution.
How can I implement this route such that I avoid timeouts while still meeting all my requirements to (1) write the response to the appropriate FTP folder, (2) audit the transaction, and (3) meet other transaction/context-specific requirements?
I'm relatively new to Camel and REST, so maybe this is easy, but I don't know what Camel and REST tools and techniques to use.
(Questions and suggestions for improvement are welcome.)

Isn't it possible to break the two main steps a part and have two asynchronous operations?
I would do as follows.
Read a file from one of several folders on an FTP site.
Set a header to indicate which folder it was read from.
Save the header and file name and other relevant information in a cache. There is a camel component called camel-cache that is relatively easy to setup and you can store key-value or any other objects.
Do some processing and auditing. Asynchronously POST to an external REST service (jax-rs 1.1, Glassfish, Java EE 6). Note that we are posting asynchronously here.
Step 2.
Receive the reply.
Lookup the reply identifiers i.e. filename or some other identifier in cache to match the reply and then fetch the header.
Do some more processing and auditing.
Write the response to one of several folders on an FTP site.
This way, you don't need to wait and processing can take 20 min or longer. You just set your cache values to not expire for say 24h.

This is a typical asynchronous use case. Can the rest service give you a token id or some unique id immediately after you hit them ?
So that you can have a batch job or some other camel route which will pick up this id from a database/cache and hit the rest service again after 20 minutes.
This is the ideal solution I can think of, if the rest service can provision this.
You are right, waiting for 20 minutes on a synchronous call is a crazy idea. Also what is the estimated size of the file/payload which you are planning to post to the rest service ?

Architecture for robust payment processing

Imagine 3 system components:
1. External ecommerce web service to process credit card transactions
2. Local Database to store processing results
3. Local UI (or win service) to perform payment processing of the customer order document
The external web service is obviously not transactional, so how to guarantee:
1. results to be eventually persisted to database when received from web service even in case the database is not accessible at that moment(network issue, db timeout)
2. prevent clients from processing the customer order while payment initiated by other client but results not successfully persisted to database yet(and waiting in some kind of recovery queue)
The aim is to do processing having non transactional system components and guarantee the transaction won't be repeated by other process in case of failure.
(please look at it in the context of post sell payment processing, where multiple operators might attempt manual payment processing; not web checkout application)

Ask the payment processor whether they can detect duplicate transactions based on an order ID you supply. Then if you are unable to store the response due to a database failure, you can safely resubmit the request without fear of double-charging (at least one PSP I've used returned the same response/auth code in this scenario, along with a flag to say that this was a duplicate).
Alternatively, just set a flag on your order immediately before attempting payment, and don't attempt payment if the flag was already set. If an error then occurs during payment, you can investigate and fix the data at your leisure.
I'd be reluctant to go down the route of trying to automatically cancel the order and resubmitting, as this just gets confusing (e.g. what if cancelling fails - should you retry or not?). Best to keep the logic simple so when something goes wrong you know exactly where you stand.

In any system like this, you need robust error handling and error reporting. This is doubly true when it comes to dealing with payments, where you absolutely do not want to accidentaly take someone's money and not deliver the goods.
Because you're outsourcing your payment handling to a 3rd party, you're ultimately very reliant on the gateway having robust error handling and reporting systems.
In general then, you hand off control to the payment gateway and start a task that waits for a response from the gateway, which is either 'payment accepted' or 'payment declined'. When you get that response you move onto the next step in your process and everything is good.
When you don't get a response at all (time out), or the response is invalid, then how you proceed very much depends on the payment gateway:
If the gateway supports it send a 'cancel payment' style request. If the payment cancels successfully then you probably want to send the user to a 'sorry, please try again' style page.
If the gateway doesn't support canceling, or you have no communications to the gateway then you will need to manually (in person, such as telephone) contact the 3rd party to discover what went wrong and how to proceed. To aid this you need to dump as much detail as you have to error logs, such as date/time, customer id, transaction value, product ids etc.
Once you're back on your site (and payment is accepted) then you're much more in control of errors, but in brief if you cant complete the order, then you should either dump the details to disk (such as csv file for manual handling) or contact the gateway to cancel the payment.
Its also worth having a system in place to track errors as they occur, and if an excessive number occur then consider what should happen. If its a high traffic site for example you may want to temporarily prevent further customers from placing orders whilst the issue is investigated.

Distributed messaging.
When your payment gateway returns submit a message to a durable queue that guarantees a handler will eventually get it and process it. The handler would update the database. Should failure occur at that point the handler can leave the message in the queue or repost it to the queue, or post an alternate message.
Should something occur later that invalidates the transaction, another message could be queued to "undo" the change.
There's a fair amount of buzz lately about eventual consistency and distribute messaging. NServiceBus is the new component hotness. I suggest looking into this, I know we are.

Web Services design

Company A has async pooling based webservice for notifications. Company B checks for notifications. Every time when it reads new notifications A deletes them from the system. Thus subsequent read requests return only new notifications. There is also requirement for the client B to interrupt the connection if there is no response within 30 sec.
This causes one potential problem: Due to unexpected slowness it is possible for A get the request deleted a notification and send the response back while B is already interrupted the connection. Under this scenario notification gets lost. Now one can argue that the core problem lies within operation realm (the HTTP response must be delivered withing 20 sec ) still on practice it is not always feasible.
How to design B (the client) to avoid this problem?
One way I can see is to do not delete the notifications by A and make B be aware of its state, so that it knows starting from what ID it needs to process notifications, but that presumes that ID will be sequential. Which is controlled by A. Even if B defines its own sequence A still has to be altered to return it back.
Are there any other approaches?
Thanks!

Web services in general are unreliable enough that it's rarely a good idea to make a "read" request serve double-duty as a "delete" request, especially without the client's knowledge. There is just too much risk of a connection dropping or timing out. There is no way to get around this only by modifying the client, because it's the server that is at fault here - the way it's designed is fundamentally unsuited for a web service.
I think you're on the right track with the incrementing IDs idea. The client knows (or can be modified to know) which notifications it's received, so if it can supply the ID of the last message it's received when it polls for notifications, the server should be able to respond based on that ID.

It really seems like Company A's webservice should be synchronous instead of asynchronous. If that is not possible, it may be a good idea to send a "ACK"-like response to a new Company A webservice that indicates a specific notification was received (by Company B) and can be deleted.

Atomic read+write access

In a web service that I am working on, a user's data needs to be updated in the background - for example pulling down and storing their tweets. As there may be multiple servers performing these updates, I want to ensure that only one can update any single user's data at one time. Therefore, (I believe) I need a method of doing an atomic read (is the user already being updated) and write (no? Then I am going to start updating). What I need to avoid is this:
Server 1 sends request to see if user is being updated.
Server 2 sends request to see if user is being updated.
Server 1 receives response back saying the user is not being updated.
Server 2 receives response back saying the user is not being updated.
Server 1 starts downloading tweets.
Server 2 starts downloading the same set of tweets.
Madness!!!
Steps 1 and 3 need to be combined into an atomic read+write operation so that Step 2 would have to wait until Step 3 had completed before a response was given. Is there a simple mechanism for effectively providing a "lock" around access to something across multiple servers, similar to the synchronized keyword in Java (but obviously distributed across all servers)?

Take a loot at Dekker's algorithm, it might give you an idea.
http://en.wikipedia.org/wiki/Dekker%27s_algorithm

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js