PBFT consensus algorithm and double spending - blockchain

I am trying to figure out how PBFT consensus algorithm deals with the problem of double spending. I've read lots of literature but cannot seem to find an answer

pbft is a consensus algorithm given by Barbara Liskov and Miguel Castro in 1999 in order to prevent malicious attacks as malicious attacks and
software errors can cause faulty nodes to exhibit Byzantine (i.e., arbitrary) behavior. pBFT was designed to work efficiently in asynchronous systems as compared to previous bft algorithms which only worked on synchronous systems.
here is the research paper which states that
Practical algorithm for state machine replication that tolerates
Byzantine faults. The algorithm offers both liveness and safety
provided at most ⌊n-1 / 3⌋ out of a total of replicas are
simultaneously faulty. This means that clients eventually receive
replies to their requests and those replies are correct according to
linearizability. The algorithm works in asynchronous systems like the
Internet and it incorporates important optimizations that enable it to
perform efficiently
Double-spending is a potential flaw in a digital or electronic cash scheme in which the same single digital token can be spent more than once. Unlike physical cash, a digital token consists of a digital file that can be duplicated or falsified.
A double-spending attack is a potential attack against cryptocurrencies that has happened to several cryptocurrencies, e.g. due to the 51% attack.
But this problem can be prevented using consensus algorithms and blockchain
If two transactions attempt to spend the same tokens, each node will consider the first transaction it sees to be valid, and the other invalid. Once the
nodes disagree, there is no way to determine true balances, as each node's observations are considered equally valid , a way to bring the nodes back in sync is using consensus algorithms and with blockchain the transactions in this system are never technically "final" as a conflicting chain of blocks can always outgrow the current canonical chain, however as blocks are built on top of a transactions, it becomes increasingly unlikely/costly for another chain to overtake it and hence preventing the double spending problem.

The first step in PBFT is to get 2f + 1 nodes to agree to execute available transactions in the same order. This is done by routing all transactions through a Primary node which assigns a sequence number. All nodes that execute the transactions in the same order will reject the second spend. Since, at most f nodes can be faulty, this means that at least 2f + 1 - f = f + 1 nodes will accept the 1st spend and reject the second. When the client learns that f + 1 nodes have accepted the first spend, it can be certain that is the consensus, since at least 1 of those nodes is non-faulty.

Related

Understanding Blockchain Reorg

I cannot find any good explanation about these 2 statements about reorg:
1.Reorgs can increase the number of nodes within a blockchain over time, causing a poorer user experience.
Why reorg increases the number of nodes?
2.When reorging becomes more common, attackers only need to beat a portion of honest miners (due to the “longest chain rule”) rather than all of them.
Why is this so?
1.Reorgs can increase the number of nodes within a blockchain over time, causing a poorer user experience.
It doesn't. The statement doesn't make sense. Likely the source you are citing is incorrect.
2.When reorging becomes more common, attackers only need to beat a portion of honest miners (due to the “longest chain rule”) rather than all of them.
Ethereum doesn't have the longest chain rule, but the heaviest chain rule, so this statement needs to be rewritten for Ethereum. Thus, this rule is probably about Bitcoin and other proof-of-work chains that rely on longest chain rule. However, without context, it is not good to have security related discussion.

Basics to how a distributed, consistent key-value storage system return the latest key when dealing with concurrent requests?

I am getting up to speed on distributed systems (studying for an upcoming interview), and specifically on the basics for how a distributed system works for a distributed, consistent key-value storage system managed in memory.
My specific questions I am stuck on that I would love just a high level answer on if it's no trouble:
#1
Let's say we have 5 servers that are responsible to act as readers, and I have one writer. When I write the value 'foo' to the key 'k1', I understand it has to propagate to all of those servers so they all store the value 'foo' for the key k1. Is this correct, or does the writer only write to the majority (quorum) for this to work?
#2
After #1 above takes place, let's say concurrently a read comes in for k1, and a write comes in to replace 'foo' with 'bar', however not all of the servers are updated with 'bar. This means some are 'foo' and some are 'bar'. If I had lots of concurrent reads, it's conceivable some would return 'foo' and some 'bar' since it's not updated everywhere yet.
When we're talking about eventual consistency, this is expected, but if we're talking about strong consistency, how do you avoid #2 above? I keep seeing content about quorum and timestamps but on a high level, is there some sort of intermediary that sorts out what the correct value is? Just wanted to get a basic idea first before I dive in more.
Thank you so much for any help!
In doing more research, I found that "consensus algorithms" such as Paxos or Raft is the correct solution here. The idea is that your nodes need to arrive at a consensus of what the value is. If you read up on Paxos or Raft you'll learn everything you need to - it's quite complex to explain here, but there are videos/resources out there that cover this well.
Another thing I found helpful was learning more about Dynamo and DynamoDB. They handle the subject as well, although not strongly consistent/distributed.
Hope this helps someone, and message me if you'd like more details!
Read the CAP theorem will help you solve your problem. You are looking for consistence and network partition in this question, so you have to sacrifice the availability. The system needs to block and wait until all nodes finish writing. In other word, the change can not be read before all nodes have updated it.
In theoretical computer science, the CAP theorem, also named Brewer's
theorem after computer scientist Eric Brewer, states that any
distributed data store can only provide two of the following three
guarantees:
Consistency Every read receives the most recent write or an error.
Availability Every request receives a (non-error) response, without
the guarantee that it contains the most recent write.
Partition tolerance The system continues to operate despite an arbitrary number
of messages being dropped (or delayed) by the network between nodes.

What is the benefit of non fault tolerance blockchain network

I'm learning about the different Hyperledger based blockchain-frameworks and currently I'm reading about Sawtooth even though the question is not particularly related with Sawtooth.
Given that PoET is as good consensus algorithm as any, what I can't get my head around is what is the benefit of having a blockchain network which is not fault tolerance.
Not only for financial operations but for anything of value, even if there is not a targeted attack, if we have a node which is not working correctly, and this node "wins the lottery" and is the node to insert the next block, what is the mechanism (before or after this) to prevent the system from proceeding with a wrong state?
And if indeed Not a fault tolerant means exactly this, there could be a faulty behavior and it won't be detected, what is the purpose of using such a system even if it's fast and scalable and so on if there is good chance to end up with an incorrect data at the end?
I'm not trying to imply that those network are useless, on the contrary, I'm trying to get a grasp on the ideas behind blockchain and the different variations there are, and because I'm sure there is a good reason for Sawtooth to exist I would like to find out where my logic fails.
Question talks about Fault Tolerance in general. However the description in the question is oriented in a way to talk about Byzantine Fault Tolerance.
Broadly we can classify Byzantine Fault Tolerance and Crash Fault Tolerance. Byzantine behavior is the unexpected scenario arising because of the node, unpredictable result from the node, it could be intentional (malicious act from a node) or unintentional (machine's memory corruption/hardware issues). Crash fault tolerance is the high availability of the system, though there can be random node failures in the network.
There's a general misconception that a Blockchain system should always be Byzantine Fault Tolerant. There can be multiple use cases for the Blockchain system. Choosing what to achieve through Blockchain is thus use case specific.
For example, in most public networks where there's incentive involved for creating a block in the network and there's no established trust or no means of establishing trust among the participants. A consensus algorithm which provides the Byzantine Fault Tolerance may be opt over there.
Another use case for the Blockchain is its immutability property, i.e. when a data/state is added to the Blockchain it becomes highly computationally difficult to modify it. In case of private Blockchain consortium, participants may optionally prove their identity to the other nodes upon request through some other means and immutability property could be of interest to them. It may not matter who wins the election or who creates a block. To give you an instance, in case of Hyperledger Sawtooth with Raft as a consensus engine, leader gets a chance to create a block always. Raft provides Crash Fault Tolerance in the network as long as majority (50%+1) of the nodes are alive.
Hyperledger Sawtooth PoET when run on SGX is Byzantine Fault Tolerant and gives all the nodes a fair random chance to construct the block. In case if PoET is run in Simulator mode then it only gives the latter capability. The protection for Byzantine behavior is coming from the Trusted Execution Environment (TEE) like Intel SGX.
Note: Blockchain systems are designed so that participants get chance to validate blocks created by the winning node. They add the block to their ledger only after validation. In case of consortium a node will be caught if tries to manipulate and because there can be other means of identifying who is the participant, Byzantine behavior can be caught.
A Blockchain is a distributed design solution, by virtue of its design it provides at the least Crash Fault Tolerance for the system as a whole. Since the same copy of data is replicated across the nodes (at least more than one node), the data is not lost even if there's a failure of one or two nodes in the network. High availability is guaranteed.
I hope these points helped you to move next in your exploration.

How do clients of a distributed blockchain know about consensus?

I have a basic blockchain I wrote to explore and learn more about the technology. The only real world experience I have with them is in a one-to-one transaction from client to server, as a record of transactions. I'm interested in distributed blockchains now.
In its simplest, most theoretical form, how is consensus managed? How do peers know to begin writing transactions on the next block? You have to know when >50% of the entire pool has accepted some last block written. But p2p systems can be essentially unbounded, and you can't trust a third party to handle surety, so how is this accomplished?
edit: I now know roughly how bitcoin handles consensus:
The consensus determines the accepted blockchain. The typical rule of "longest valid chain first" ensures that only one variant is accepted. People may accept a blockchain after any number of confirmations, typically 6 is sufficient to ensure a clear winner.
However, this seems like a slow and least-deliberate method. It ensures that there is a certain amount of wasted work on the part of nodes that happen to be in a part of the network that had a local valid solution at roughly the same time as a generally accepted solution.
Are there better alternatives?
Interesting question. I would say the blockchain technology solves only probabilistic consensus. With a certain confidence, the blockchain-network agrees on something.
Viewing blockchain as a distributed system we can say that the state of blockchain is distributed: the blockchain is kept as a whole but there are many distributed replicas of local copies. More interestingly, the operations are distributed: Writes or reads can happen at different nodes concurrently. Read operations can be done locally at the local copy of the blockchain, but this read can of course be stale if your local copy is not up-to-date, however there is always an incentive for nodes in the blockchain network to keep their local copy up-to-date so that they can complete new transactions when necessary.
Write operations is the tricky part here, that blockchain must solve. As writes happen concurrently in a distributed fashion, blockchain must ensure to avoid inconsistencies such as double spending and somehow reach consensus on the current state. The way blockchain does this is probabilistic, first of all they made it expensive to write to the chain by adding the "puzzle" to be solved, reducing the probability that different distributed writes happen concurrently, but they can still happen, but with lower probability. In addition, as there is an incentive for nodes in the network to keep their state up to date, nodes that received the flooded write operation will validate it and accept that operation into their chain. I think the incentive to always keep the chain up-to-date is key here because that ensures that the chain will make progress. I.e a writer has a clear incentive to keep its chain up-to-date since it will be competing with the "longest-chain-first" principle against other concurrent writers. For non-adversarial miners there is also an incentive to interrupt the current mining, accept a new write-block and restart the mining process, ensuring a sort of liveness in the system.
So blockchain relies on probabilistic consensus, what is the probability then? The probability that two exactly equal branches growing in parallel at the same time is close to 0 assuming that there are not any large group of adversarial nodes taking over the network. With very high probability one branch will be longer than the other and be accepted and the network reach consensus on that branch and write operations in the shorter branch have to be re-tried. The big concern is of course big adversarial miner groups who might deliberately try to create forks in the blockchain to perform double spending attacks.. but that is only likely to succeed if they get close to 50% of the computational power in the network.
So to conclude: natural branching in blockchain that can happen due to probabilistic reasons of concurrent writes (probability reduced due to the puzzle-solving) will with almost 100% probability converge to a single branch as write operations continue to happen, and the network reaches consensus on a single branch.
However, this seems like a slow and least-deliberate method. It
ensures that there is a certain amount of wasted work on the part of
nodes that happen to be in a part of the network that had a local
valid solution at roughly the same time as a generally accepted
solution.
Are there better alternatives?
Not that I can think of, there would be many more efficient solutions if all peers in the system "were under control" and you could make them follow some protocol and perhaps have a designated leader to tell the order of writes and ensure consensus, but that is not possible in a decentralized open system.
In the permissioned blockchain environment, where the participants are known in advance, client can get cryptographic proof of the consensus (e.g. that it was signed at least by 2/3 of the participants) and to verify it. Usually it can be achieved using threshold signatures.
In the public blockchains, AFAIK, there is no way to do this since the number of participants is unknown/changes all the time.

What is consensus in hyperledger

On the below link, there is a paragraph that says:
http://www.coindesk.com/stellar-ripple-hyperledger-rivals-bitcoin-proof-work/
“Each node publishes a public key. Any message coming through the node is signed by the node to verify its format. Once enough responses that are identical are reached, then you can agree that is a valid transaction.”
My Understanding:
Once a transaction
Consensus is a mechanism by which the nodes in the blockchain decide that a transaction block can be appended to the blockchain. There are many consensus mechanisms - for example, Bitcoin uses a consensus mechanism called Proof of work, ethereum uses a consensus mechanism called Proof of stake. The consensus can be at a ledger level (all nodes have to agree) or transaction level (only the transacting nodes have to agree) In case of Hyperledger, consensus is at transaction level, meaning not all nodes need to engage in the consensus mechanism. Only the two transacting parties can engage and arrive at a consensus. Detailed technical explanation of PBFT (practical byzantine fault tolerance) based Hyperledger Fabric consensus is explained in this link:
http://hyperledger-fabric.readthedocs.io/en/release/txflow.html
A less technical explanation can be found here: this also talks about the different roles the nodes take in Hyperledger Fabric
https://medium.com/#philippsandner/comparison-of-ethereum-hyperledger-fabric-and-corda-21c1bb9442f6
Hyperledger is an umbralla project aims at creating a modulus approach for assembling the blockchain solutions. It has a layered architecture including a separate consensus layer. The goal is you should be able to switch in-and-out the consensus policy of your business need
This table from Hyperledger
Architecture, Volume 1
gives you examples of its consensus approaches in various child projects.
In this type of consensus:
A transaction is performed, i.e. someone buys something from someone
else.
The person who desires for this transaction to become a legitimate
block on the blockchain will send out a cryptographic hash.
The hash is a function which scrambles its inputs and creates an
output.
There is no easy way to solve for the original inputs so peers will
put random numbers into the function in an attempt to find the inputs that created the hash.
After enough of these peers have independently solved the problem
then the transaction is considered legitimate and the transaction
goes on the ledger.
In the bitcoin model, this means the bitcoins are immediately moved to the other party's account.
The number of peers needed to validate the transaction is often calculated by a Byzantine fault tolerance algorithm. You can read the full paper at the link below but it basically means that the system needs:
n = 2f + 1 peers to agree where n is the total number of peers
and f is the number of failing peers.
For example, if you have 4 peers then according to the algorithm three of them must agree before consensus can be achieved.
Here is the example with 4 peers:
n = 4
4 = 2f + 1
3 = 2f
1.5 = f
total failures can only be 1
n - 1 = 4 - 1 = 3 peers must agree
Included with the paper on the algorithm are another slideshow that may be helpful to understand it and a link to a video that should be helpful about the bitcoin model in general.
http://pmg.csail.mit.edu/papers/osdi99.pdf
http://www.cs.utah.edu/~stutsman/cs6963/public/pbft.pdf
https://www.youtube.com/watch?v=GMKgB3zZ1so