Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 12 days ago.
Improve this question
I really donĀ“t get the difference between blockchain scalability and blockchain performance. Can some explain it to me?
The differences would be conceptual and categoric. Blockchain scalability as a concept has wider significance in terms of scalability of a blockchain network, node, protocol etc. Blockchain scalability can be measured in terms of a collection of parameters such as transactions per second, latency, response time etc. It is the ability of a blockchain network to scale as per the demands of the participating nodes.
Blockchain performance is quite a subjective term in comparison with scalability. Performance is the current throughput from any live system. Blockchain performance could be measured in terms of the current capacity of nodes and network to manage data at rest and data in motion. It could be measured in terms of the total active users and total concurrent users. Performance improvement and performance optimisation can be done without scaling the system. In the world of blockchain systems, methods like pruning, zero knowledge proofs, compression etc. are widely adopted performance improvement techniques.
The following diagram from the art of scalability book is a great illustration of the different measures of scalability such as scale by cloning, scale by functional decomposition and scale by partitoning. These are three different ways to scale systems. Systems of blockchain protocols can also be scaled in three ways like this. One is scaling by increasing the number of identical nodes ( cloning ). Another approach is scaling by introducing different layers ( side chains, state channels etc. ). Then we can also scale them by creating shards.
Performance is an indication of the responsiveness of a system to execute any action within a given time interval, that means even if your blockchain is handling e.g. 5 transactions how is it handling it? while scalability is ability of a system either to handle increases in load without impact on performance or for the available resources to be readily increased.
When we talk about blockchain networks, I can suggest the following approach:
Performance is a rate at which useful data is added to the blockchain state.
Scalability is a maximal level of performance, security and decentralisation which target blockchain network can handle simultaneously.
To keep it short and simple, Blockchain scalability is the ability of a blockchain network to expand its capacity in order to accommodate a growing number of transactions. Blockchain performance is a measure of how well a blockchain network can process transactions in a timely and efficient manner.
Related
What is the best way to reduce the risk of this attack?
How to protect from 51% Attack?
The nature of the system means that this attack cannot be prevented. Think of it this way, if you have a perfectly decentralized system in which the participants have control over the network (not some centralized authority), then the users get to vote on changes. The way to vote in blockchain is with your mining hashpower. If a majority (>50%) of the network votes on a change, then the change goes into effect (theoretically). So, how could you prevent this unless you centralize the network?
Now, in actuality, an attacker would likely need much more than 51% because not only do they have to outpace the network, they have to do so with every block after the one they want to modify, because what if a new block is mined by someone else while they are trying to outpace the network? They would need much more hashpower to have a good chance of successfully pulling it off.
Prevention
The real answer is you can't really prevent it, since it is a decentralized network, but if you are designing a new blockchain, the answer is to make it as decentralized as possible. Here are some considerations:
Commoditization of the mining hardware (commoditizing ASICs). Note this goes against some conventional thinking, that hashing algorithms should be ASIC resistant, but there is a good article that explains why that is a bad idea: ASICs and Decentralization FAQs Users who have to pay a lot, or find it difficult to get hashpower will likely not mine and it will be left to a few large players with the resources to do so. This results in more centralization of mining.
Avoid forking an existing coin with much larger hashpower. Users of the original coin now own coins on your new chain and are incentivized to attack it if they have a much larger portion of hashpower they can switch over to the new coin. If you do fork an existing coin, consider changing the hashing algorithm so miners of the original coin would have to invest more capital in order to attack.
I have a basic blockchain I wrote to explore and learn more about the technology. The only real world experience I have with them is in a one-to-one transaction from client to server, as a record of transactions. I'm interested in distributed blockchains now.
In its simplest, most theoretical form, how is consensus managed? How do peers know to begin writing transactions on the next block? You have to know when >50% of the entire pool has accepted some last block written. But p2p systems can be essentially unbounded, and you can't trust a third party to handle surety, so how is this accomplished?
edit: I now know roughly how bitcoin handles consensus:
The consensus determines the accepted blockchain. The typical rule of "longest valid chain first" ensures that only one variant is accepted. People may accept a blockchain after any number of confirmations, typically 6 is sufficient to ensure a clear winner.
However, this seems like a slow and least-deliberate method. It ensures that there is a certain amount of wasted work on the part of nodes that happen to be in a part of the network that had a local valid solution at roughly the same time as a generally accepted solution.
Are there better alternatives?
Interesting question. I would say the blockchain technology solves only probabilistic consensus. With a certain confidence, the blockchain-network agrees on something.
Viewing blockchain as a distributed system we can say that the state of blockchain is distributed: the blockchain is kept as a whole but there are many distributed replicas of local copies. More interestingly, the operations are distributed: Writes or reads can happen at different nodes concurrently. Read operations can be done locally at the local copy of the blockchain, but this read can of course be stale if your local copy is not up-to-date, however there is always an incentive for nodes in the blockchain network to keep their local copy up-to-date so that they can complete new transactions when necessary.
Write operations is the tricky part here, that blockchain must solve. As writes happen concurrently in a distributed fashion, blockchain must ensure to avoid inconsistencies such as double spending and somehow reach consensus on the current state. The way blockchain does this is probabilistic, first of all they made it expensive to write to the chain by adding the "puzzle" to be solved, reducing the probability that different distributed writes happen concurrently, but they can still happen, but with lower probability. In addition, as there is an incentive for nodes in the network to keep their state up to date, nodes that received the flooded write operation will validate it and accept that operation into their chain. I think the incentive to always keep the chain up-to-date is key here because that ensures that the chain will make progress. I.e a writer has a clear incentive to keep its chain up-to-date since it will be competing with the "longest-chain-first" principle against other concurrent writers. For non-adversarial miners there is also an incentive to interrupt the current mining, accept a new write-block and restart the mining process, ensuring a sort of liveness in the system.
So blockchain relies on probabilistic consensus, what is the probability then? The probability that two exactly equal branches growing in parallel at the same time is close to 0 assuming that there are not any large group of adversarial nodes taking over the network. With very high probability one branch will be longer than the other and be accepted and the network reach consensus on that branch and write operations in the shorter branch have to be re-tried. The big concern is of course big adversarial miner groups who might deliberately try to create forks in the blockchain to perform double spending attacks.. but that is only likely to succeed if they get close to 50% of the computational power in the network.
So to conclude: natural branching in blockchain that can happen due to probabilistic reasons of concurrent writes (probability reduced due to the puzzle-solving) will with almost 100% probability converge to a single branch as write operations continue to happen, and the network reaches consensus on a single branch.
However, this seems like a slow and least-deliberate method. It
ensures that there is a certain amount of wasted work on the part of
nodes that happen to be in a part of the network that had a local
valid solution at roughly the same time as a generally accepted
solution.
Are there better alternatives?
Not that I can think of, there would be many more efficient solutions if all peers in the system "were under control" and you could make them follow some protocol and perhaps have a designated leader to tell the order of writes and ensure consensus, but that is not possible in a decentralized open system.
In the permissioned blockchain environment, where the participants are known in advance, client can get cryptographic proof of the consensus (e.g. that it was signed at least by 2/3 of the participants) and to verify it. Usually it can be achieved using threshold signatures.
In the public blockchains, AFAIK, there is no way to do this since the number of participants is unknown/changes all the time.
I'm implementing a performance heavy two-party protocol in C++14 utilising multithreading and am currently using ZeroMQ as a network layer.
The application has the following simple architecture:
One main server-role,
One main client-role,
Both server and client spawn a fixed number n of threads
All n parallel concurrent thread-pairs execute some performance and communication heavy mutual, but exclusive, protocol exchange, i.e. these run in n fixed pairs and should not mix / interchange any data but with the pairwise fixed-opponent.
My current design uses a single ZeroMQ Context()-instance on both server and client, that is shared between all n-local threads and each respective client/server thread-pair creates a ZMQ_PAIR socket ( I just increment the port# ) on the local, shared, context for communication.
My question
Is there is a smarter or more efficient way of doing this?
i.e.: is there a natural way of using ROUTERS and DEALERS that might increase performance?
I do not have much experience with socket programming and with my approach the number of sockets scales directly with n ( a number of client-server thread-pairs ). This might go to the couple of thousands and I'm unsure if this is a problem or not.
I have control of both the server and client machines and source code and I have no outer restrictions that I need to worry about. All I care about is performance.
I've looked through all the patterns here, but I cannot find anyone that matches the case where the client-server pairs are fixed, i.e. I cannot use load-balancing and such.
Happy man!
ZeroMQ is a lovely and powerful tool for highly scaleable, low-overheads, Formal Communication ( behavioural, yes emulating some sort of the peers mutual behaviour "One Asks, the other Replies" et al ) Patterns.
Your pattern is quite simple, behaviourally-unrestricted and ZMQ_PAIR may serve well for this.
Performance
There ought be some more details on quantitative nature of this attribute.
a process-to-process latency [us]
a memory-footprint of a System-under-Test (SuT) architecture [MB]
a peak-amount of data-flow a SuT can handle [MB/s]
Performance Tips ( if quantitatively supported by observed performance data )
may increase I/O-performance by increasing Context( nIOthreads ) on instantiation
may fine-tune I/O-performance by hard-mapping individual thread# -> Context.IO-thread# which is helpfull for both distributed workload and allows one to keep "separate" localhost IOthread(s) free / ready for higher-priority signalling and others such needs.
shall setup application-specific ToS-labeling of prioritised types of traffic, so as to allow advanced processing on the network-layer alongside the route-segments between the client and server
if memory-footprint hurts ( ZeroMQ is not Zero-copy on TCP-protocol handling at operating-system kernel level ) one may try to move to a younger sister of ZeroMQ -- authored by Martin SUSTRIK, a co-father of ZeroMQ -- a POSIX compliant nanomsg with similar motivation and attractive performance figures. Worth to know about, at least.
Could ROUTER or DEALER increase an overall performance?
No, could not. Having in mind your stated architecture ( declared to be communication heavy ), other, even more sophisticated Scaleable Formal Communication Patterns behaviours that suit some other needs, do not add any performance benefit, but on the contrary, would cost you additional processing overheads without delivering any justifying improvement.
While your Formal Communication remains as defined, no additional bells and whistles are needed.
One point may be noted on ZMQ_PAIR archetype, some sources quote this to be rather an experimental archetype. If your gut sense feeling does not make you, besides SuT-testing observations, happy to live with this, do not mind a slight re-engineering step, that will keep you with all the freedom of un-pre-scribed Formal Communication Pattern behaviour, while having "non"-experimental pipes under the hood -- just replace the solo ZMQ_PAIR with a pair of ZMQ_PUSH + ZMQ_PULL and use messages with just a one-way ticket. Having the stated full-control of the SuT-design and implementation, this would be all within your competence.
How fast could I go?
There are some benchmark test records published for both the ZeroMQ or nanomsg performance / latency envelopes for un-loaded network transports across the traffic-free route-segments ( sure ).
If your SuT-design strives to go even faster -- say under some 800 ns end-to-end, there are other means to achieve this, but your design will have to follow other distributed computing strategy than a message-based data exchange and your project budget will have to adjust for additional expenditure for necessary ultra-low-latency hardware infrastructure.
It may surprise, but definitely well doable and pretty attractive for systems, where hundreds of nanoseconds are a must-have target within a Colocation Data Centre.
We planning to move some of writes our back-end will do from RDBMS to NoSQL, as we expect them to be the main bottleneck.
Our business process has 95%-99% concurrent writes, and only concurrent 1%-5% reads on average. There will be a massive amount of data involved, so in-memory NoSQL DB won't fit.
What NoSQL DB on-disk would be optimal for this case?
Thanks!
If the concurrent writes are creating conflicts and data integrity is an issue, NoSQL isn't probably your way to go. You can easily test this with a data management that supports "optimistic concurrency" as then you can measure the real life locking conflicts and analyze them in details.
I am a little bit surprised when you say that you expect problems" without any further details. Let me give you one answer: Based on the facts you gave us. What is 100,000 sources and what is the writing scenario? MySQl is not best example of handling scalable concurrent writes etc.
It would be helpful if you'd provide some kind of use case(s) or anything helping to understand the problem in details.
Let me take two examples: In memory database having an advanced write dispatcher, data versioning etc, can easily take 1M "writers" the writers being network elements and the application an advanced NMS system. Lots of writes, no conflicts, optimistic concurrency, in-memory write buffering up to 16GB's, async parallel writing to 200+ virtual spindles (SSD or magnetic disks) etc. A real "sucker" to eat new data! An excellent candidate to scale the performance to its limits.
2nd example: MSC having a sparse number space, e.g. mobile numbers being "clusters" of numbers. Huge number space, but max. 200M individual addresses. Very rare situations where there are conflicting writes. RDBMS was replaced with memory mapped sparse files. And the performance improvement was close to 1000x, yes 1000x in best case, and "only" 100x in worst case. The replacement code was roughly 300 lines of C. That was a True BigNoSQL, as it was a good fit to the problem to be solved.
So, in short, without knowing more details, there is no "silver bullet" to answer your question. We're not after warewolves here, it's just "big bad data". When we don't know if your workload is "transactional" aka. number or IO's and latency sensitive, or "BLOB like" aka. streaming media, geodata etc, it would give 100% wrong results to promise anything. Bandwidth and io-rate/latency/transactions are more or less a trade-off in real life.
See for example http://publib.boulder.ibm.com/infocenter/soliddb/v6r3/index.jsp?topic=/com.ibm.swg.im.soliddb.sql.doc/doc/pessimistic.vs.optimistic.concurrency.control.html for some furher details.
What is the key difference between Fork/Join and Map/Reduce?
Do they differ in the kind of decomposition and distribution (data vs. computation)?
One key difference is that F-J seems to be designed to work on a single Java VM, while M-R is explicitly designed to work on a large cluster of machines. These are very different scenarios.
F-J offers facilities to partition a task into several subtasks, in a recursive-looking fashion; more tiers, possibility of 'inter-fork' communication at this stage, much more traditional programming. Does not extend (at least in the paper) beyond a single machine. Great for taking advantage of your eight-core.
M-R only does one big split, with the mapped splits not talking between each other at all, and then reduces everything together. A single tier, no inter-split communication until reduce, and massively scalable. Great for taking advantage of your share of the cloud.
There is a whole scientific paper on the subject, Comparing Fork/Join and MapReduce.
The paper compares the performance, scalability and programmability of three parallel paradigms: fork/join, MapReduce, and a hybrid approach.
What they find is basically that Java fork/join has low startup latency and scales well for small inputs (<5MB), but it cannot process larger inputs due to the size restrictions of shared-memory,
single node architectures. On the other hand, MapReduce has significant startup latency (tens of seconds), but scales well for much larger inputs (>100MB) on a compute cluster.
But there is a lot more to read there if you're up for it.