Understanding Blockchain Reorg

Understanding Blockchain Reorg - blockchain

I cannot find any good explanation about these 2 statements about reorg:
1.Reorgs can increase the number of nodes within a blockchain over time, causing a poorer user experience.
Why reorg increases the number of nodes?
2.When reorging becomes more common, attackers only need to beat a portion of honest miners (due to the “longest chain rule”) rather than all of them.
Why is this so?

1.Reorgs can increase the number of nodes within a blockchain over time, causing a poorer user experience.
It doesn't. The statement doesn't make sense. Likely the source you are citing is incorrect.
2.When reorging becomes more common, attackers only need to beat a portion of honest miners (due to the “longest chain rule”) rather than all of them.
Ethereum doesn't have the longest chain rule, but the heaviest chain rule, so this statement needs to be rewritten for Ethereum. Thus, this rule is probably about Bitcoin and other proof-of-work chains that rely on longest chain rule. However, without context, it is not good to have security related discussion.

Related

PBFT consensus algorithm and double spending

I am trying to figure out how PBFT consensus algorithm deals with the problem of double spending. I've read lots of literature but cannot seem to find an answer

pbft is a consensus algorithm given by Barbara Liskov and Miguel Castro in 1999 in order to prevent malicious attacks as malicious attacks and
software errors can cause faulty nodes to exhibit Byzantine (i.e., arbitrary) behavior. pBFT was designed to work efficiently in asynchronous systems as compared to previous bft algorithms which only worked on synchronous systems.
here is the research paper which states that
Practical algorithm for state machine replication that tolerates
Byzantine faults. The algorithm offers both liveness and safety
provided at most ⌊n-1 / 3⌋ out of a total of replicas are
simultaneously faulty. This means that clients eventually receive
replies to their requests and those replies are correct according to
linearizability. The algorithm works in asynchronous systems like the
Internet and it incorporates important optimizations that enable it to
perform efficiently
Double-spending is a potential flaw in a digital or electronic cash scheme in which the same single digital token can be spent more than once. Unlike physical cash, a digital token consists of a digital file that can be duplicated or falsified.
A double-spending attack is a potential attack against cryptocurrencies that has happened to several cryptocurrencies, e.g. due to the 51% attack.
But this problem can be prevented using consensus algorithms and blockchain
If two transactions attempt to spend the same tokens, each node will consider the first transaction it sees to be valid, and the other invalid. Once the
nodes disagree, there is no way to determine true balances, as each node's observations are considered equally valid , a way to bring the nodes back in sync is using consensus algorithms and with blockchain the transactions in this system are never technically "final" as a conflicting chain of blocks can always outgrow the current canonical chain, however as blocks are built on top of a transactions, it becomes increasingly unlikely/costly for another chain to overtake it and hence preventing the double spending problem.

The first step in PBFT is to get 2f + 1 nodes to agree to execute available transactions in the same order. This is done by routing all transactions through a Primary node which assigns a sequence number. All nodes that execute the transactions in the same order will reject the second spend. Since, at most f nodes can be faulty, this means that at least 2f + 1 - f = f + 1 nodes will accept the 1st spend and reject the second. When the client learns that f + 1 nodes have accepted the first spend, it can be certain that is the consensus, since at least 1 of those nodes is non-faulty.

In Substrate, What is the difference between Babe, Aura, and Grandpa

Substrate supports "pluggable consensus" so a developer can choose from among several consensus algorithms. It comes standard with four algorithms:
Aura
Babe
Proof of Work
Grandpa
Some of these (eg babe and grandpa) can even be used together in a single node. What are the differences between each consensus algorithm, and which ones can or should be used together?

For a blockchain to be live (continue growing and adding new transactions), two things must happen to solve the problem of distributed consensus. Typically, these jobs are performed by full nodes as is the case with the default Substrate node.
Block Authoring. Nodes create new blocks. Each new block contains a reference to a parent block.
Block Finalization. When forks appear in the chain, Nodes must choose which side of the fork to consider the real or "canonical" one. Once a block is Finalized, the canonical chain will always contain it.
Let's look at each of the algorithms mentioned individually, and see how they accomplish those tasks.
Block Authoring
Aura
Aura primarily provides block authoring. In aura a known set of authorities are allowed to produce blocks. The authorities must be chosen before block production begins and all authorities must know the entire authority set. Time is divided up into "slots" of a fixed length. During each slot one block is produced, and the authorities take turns producing blocks in order forever.
In Aura, forks only happen when it takes longer than the slot duration for a block to traverse the network. Thus forks are uncommon in good network conditions.
Babe
Babe also primarily provides block authoring. Like, Aura, it is a slot-based consensus algorithm with a known set of validators. In addition, each validator is assigned a weight which must be assigned before block production begins. Unlike Aura, the authorities don't take turns in order. Instead, during each round, each authority generates a pseudorandom number using a VRF. If the random number is lower than their weight, they are allowed to produce a block.
Because multiple validators may be able to produce a block during the same slot, forks are more common in Babe than they are in Aura, and are common even in good network conditions.
Substrate's implementation of Babe also has a fallback mechanism for when no authorities are chosen in a given slot.
Proof of Work
Proof of Work also provides block authoring. Unlike Babe and Aura, it is not slot-based, and does not have a known authority set. In proof of Work, anyone can produce a block at any time, so long as they can solve a computationally challenging problem (typically a hash preimage search). The difficulty of this problem can be tuned to provide a statistical target block time.
Block Finalization
Probabilistic methods
Each of the block authoring mechanisms we've discussed previously needs to know where on the chain is should build the next block. Methods such as the "longest chain rule" "heaviest observed subtree" often work in practice and provide probabilistic finality. That is, with each new block that is added to a chain, the probability that it will be reverted decreases, approaching zero. When true certainty that a block is final is desired, a more sophisticated game can be used.
Grandpa
Grandpa provides block finalization. It has a known weighted authority set like Babe. However, Grandpa does not author blocks; it just listens to gossip about blocks that have been produced by some authoring engine like the three discussed above. Each authority participates in two rounds of voting on blocks. The details of the voting are beyond the scope of this post. Once 2/3 of the grandpa authorities have voted for a particular block, it is considered finalized.
Hybrid Consensus
In general a block authoring engine and a finality gadget can be used together in a single chain, as Babe and Grandpa are in the code linked in the question. When such a system is used, block authoring engines must be made aware of blocks that are finalized so that they don't waste time building on top of blocks that will never be in the canonical chain.
Note on weights: Babe, Grandpa, and many other algorithms that do not come bundled with Substrate rely on weights. Consensus algorithms themselves typically do not dictate how the weights are assigned, but rather assume that they are assigned somehow leaving the assignment to an external mechanism. In public networks, it is common to assign weights based on how many tokens are staked. In the default Substrate node, all weights are set to 1 because the phragmen algorithm keeps all validators close to equally staked.

How to protect from 51% Attack?

What is the best way to reduce the risk of this attack?

How to protect from 51% Attack?
The nature of the system means that this attack cannot be prevented. Think of it this way, if you have a perfectly decentralized system in which the participants have control over the network (not some centralized authority), then the users get to vote on changes. The way to vote in blockchain is with your mining hashpower. If a majority (>50%) of the network votes on a change, then the change goes into effect (theoretically). So, how could you prevent this unless you centralize the network?
Now, in actuality, an attacker would likely need much more than 51% because not only do they have to outpace the network, they have to do so with every block after the one they want to modify, because what if a new block is mined by someone else while they are trying to outpace the network? They would need much more hashpower to have a good chance of successfully pulling it off.
Prevention
The real answer is you can't really prevent it, since it is a decentralized network, but if you are designing a new blockchain, the answer is to make it as decentralized as possible. Here are some considerations:
Commoditization of the mining hardware (commoditizing ASICs). Note this goes against some conventional thinking, that hashing algorithms should be ASIC resistant, but there is a good article that explains why that is a bad idea: ASICs and Decentralization FAQs Users who have to pay a lot, or find it difficult to get hashpower will likely not mine and it will be left to a few large players with the resources to do so. This results in more centralization of mining.
Avoid forking an existing coin with much larger hashpower. Users of the original coin now own coins on your new chain and are incentivized to attack it if they have a much larger portion of hashpower they can switch over to the new coin. If you do fork an existing coin, consider changing the hashing algorithm so miners of the original coin would have to invest more capital in order to attack.

How do clients of a distributed blockchain know about consensus?

I have a basic blockchain I wrote to explore and learn more about the technology. The only real world experience I have with them is in a one-to-one transaction from client to server, as a record of transactions. I'm interested in distributed blockchains now.
In its simplest, most theoretical form, how is consensus managed? How do peers know to begin writing transactions on the next block? You have to know when >50% of the entire pool has accepted some last block written. But p2p systems can be essentially unbounded, and you can't trust a third party to handle surety, so how is this accomplished?
edit: I now know roughly how bitcoin handles consensus:
The consensus determines the accepted blockchain. The typical rule of "longest valid chain first" ensures that only one variant is accepted. People may accept a blockchain after any number of confirmations, typically 6 is sufficient to ensure a clear winner.
However, this seems like a slow and least-deliberate method. It ensures that there is a certain amount of wasted work on the part of nodes that happen to be in a part of the network that had a local valid solution at roughly the same time as a generally accepted solution.
Are there better alternatives?

Interesting question. I would say the blockchain technology solves only probabilistic consensus. With a certain confidence, the blockchain-network agrees on something.
Viewing blockchain as a distributed system we can say that the state of blockchain is distributed: the blockchain is kept as a whole but there are many distributed replicas of local copies. More interestingly, the operations are distributed: Writes or reads can happen at different nodes concurrently. Read operations can be done locally at the local copy of the blockchain, but this read can of course be stale if your local copy is not up-to-date, however there is always an incentive for nodes in the blockchain network to keep their local copy up-to-date so that they can complete new transactions when necessary.
Write operations is the tricky part here, that blockchain must solve. As writes happen concurrently in a distributed fashion, blockchain must ensure to avoid inconsistencies such as double spending and somehow reach consensus on the current state. The way blockchain does this is probabilistic, first of all they made it expensive to write to the chain by adding the "puzzle" to be solved, reducing the probability that different distributed writes happen concurrently, but they can still happen, but with lower probability. In addition, as there is an incentive for nodes in the network to keep their state up to date, nodes that received the flooded write operation will validate it and accept that operation into their chain. I think the incentive to always keep the chain up-to-date is key here because that ensures that the chain will make progress. I.e a writer has a clear incentive to keep its chain up-to-date since it will be competing with the "longest-chain-first" principle against other concurrent writers. For non-adversarial miners there is also an incentive to interrupt the current mining, accept a new write-block and restart the mining process, ensuring a sort of liveness in the system.
So blockchain relies on probabilistic consensus, what is the probability then? The probability that two exactly equal branches growing in parallel at the same time is close to 0 assuming that there are not any large group of adversarial nodes taking over the network. With very high probability one branch will be longer than the other and be accepted and the network reach consensus on that branch and write operations in the shorter branch have to be re-tried. The big concern is of course big adversarial miner groups who might deliberately try to create forks in the blockchain to perform double spending attacks.. but that is only likely to succeed if they get close to 50% of the computational power in the network.
So to conclude: natural branching in blockchain that can happen due to probabilistic reasons of concurrent writes (probability reduced due to the puzzle-solving) will with almost 100% probability converge to a single branch as write operations continue to happen, and the network reaches consensus on a single branch.
However, this seems like a slow and least-deliberate method. It
ensures that there is a certain amount of wasted work on the part of
nodes that happen to be in a part of the network that had a local
valid solution at roughly the same time as a generally accepted
solution.
Are there better alternatives?
Not that I can think of, there would be many more efficient solutions if all peers in the system "were under control" and you could make them follow some protocol and perhaps have a designated leader to tell the order of writes and ensure consensus, but that is not possible in a decentralized open system.

In the permissioned blockchain environment, where the participants are known in advance, client can get cryptographic proof of the consensus (e.g. that it was signed at least by 2/3 of the participants) and to verify it. Usually it can be achieved using threshold signatures.
In the public blockchains, AFAIK, there is no way to do this since the number of participants is unknown/changes all the time.

Why parallel computing helps the use of non-local resources?

I've been reading about parallel computing and it's obvious how it helps saving time and solving larger problems, but I don't get how it takes advantage of the use of non-local resources.

Better ask IF, before next asking WHY:
With all due respect, the answer for the IF is not granted at all.
Any, and here let me emphasise that before ANY attempt to change a pure-[SERIAL] code-execution scheduling into any form of either "just"-[CONCURRENT] or even true-[PARALLEL] first collect sufficient quantitative evidence that such efforts will bring an expected set of performance benefits.
This is not obvious and not every professional designer starts with comparing apples to apples.
There are costs to be paid, ALWAYS:
If speaking about using resources, always start with accounting all the costs, that such a use accrues - be it a local or non-local resource. NUMA-architectures have brought many surprises once costs-of-use are indeed accounted for.
In modern criticism of the original Amdahl's Law formulation one has to decode a principal must to also include all the add-on overhead costs, as the original -- overhead-ignorant or overhead-naive if one may wish to -- Amdahl's Law formulation was immensely misleading.
The next cardinal vector of the contemporary criticism of the original law formulation comes from the fact that the block of the yet [PAR]-convertible part of the problem processing is not infinitely divisible, but exhibits some sort of further indivisible "atomic"-part, that will never benefit from the next available free resources ( even if such resources are in almost infinite capacity ), that could be used for parallel processing, but the "atomically"-indivisible part cannot be further split, as its internal dependency-chain cannot be split and/or distributed.
There are not many problems, that enjoy above 90% [PARALLEL]-fraction, so always keep in mind the even overhead-naive formulation demonstration glass-ceiling for the expected acceleration - reaching < 10x for even infinite amount of processors ( plus check both the parts of the criticism of the original law formulation ). On the very contrary, each and every improvement in the [SERIAL]-fraction of the process will always 1:1 influence the gain in performance.
Answer:
So, IF has been validated first that some non-local resource can help more, than has to be and always will be paid ( in a sum of all the types of the add-on costs ) for a chance to use of such a distributed-system parallel-processing non-local resource, then WHY NOT.
Understanding the Economy-of-Use rules the game.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js