If I develop an online application using Blockchain and I have three parties(or peers) required to reach consensus then Do I need a node for each of the three participants or the one node is enough. what I am not able to understand is how I will maintain a node or nodes.
I will be maintaining the database at one location?
First understand that blockchain architecture is not as normal client-server architecture.
In normal client-server architecture, client can change entries that are stored on a centralized server. By changing master copy, whenever a user accesses a database, they will get the updated version.
This is not at all the same as with blockchain technology.
A blockchain is a mesh network of computers linked not to a central server but rather to each other. Computers in this network define and agree upon a shared state of data and adhere to certain constraints imposed upon this data. For a blockchain database, each participant maintains, calculates and updates new entries into the database. All nodes work together to ensure they are all coming to the same conclusions, providing in-built security for the network. Database are distributed across the participant nodes and transactions are immutable.
How the blockchain will maintain different nodes?
Through the use of a peer-to-peer network and a distributed timestamping server, a blockchain database is managed autonomously.
For a production network of three parties (in Hyperledger Fabric we use the term 'organizations') you would likely want to have a network of multiple (2 or more) peer nodes per organization for crash tolerance and increased resilience. You would also likely want to run these peer nodes on different host nodes in different data centers or cloud availability zones.
Related
Recently in a system design interview I was asked a question where cities were divided into zones and data of around 100 zones was available. An api took the zoneid as input and returned all the restaurants for that zone in response. The response time for the api was 50ms so the zone data was kept in memory to avoid delays.
If the zone data is approximately 25GB, then if the service is scaled to say 5 instances, it would need 125GB ram.
Now the requirement is to run 5 instances but use only 25 GB ram with the data split between instances.
I believe to achieve this we would need a second application which would act as a config manager to manage which instance holds which zone data. The instances can get which zones to track on startup from the config manager service. But the thing I am not able to figure out is how we redirect the request for a zone to the correct instance which holds its data especially if we use kubernetes. Also if the instance holding partial data restarts then how do we track which zone data it was holding
Splitting dataset over several nodes: sounds like sharding.
In-memory: the interviewer might be asking about redis or something similar.
Maybe this: https://redis.io/topics/partitioning#different-implementations-of-partitioning
Redis cluster might fit -- keep in mind that when the docs mention "client-side partitioning": the client is some redis client library, loaded by your backends, responding to HTTP client/end-user requests
Answering your comment: then, I'm not sure what they were looking for.
Comparing Java hashmaps to a redis cluster isn't completely fair, considering one is bound to your JVM, while the other is actually distributed / sharded, implying at least inter-process communications and most likely network/non-local queries.
Then again, if the question is to scale an ever-growing JVM: at some point, we need to address the elephant in the room: how do you guarantee data consistency, proper replication/sharding, what do you do when a member goes down, ...?
Distributed hashmap, using Hazelcast, may be more relevant. Some (hazelcast) would make the argument it is safer under heavy write load. Others that migrating from Hazelcast to Redis helped them improve service reliability. I don't have enough background in Java myself, I wouldn't know.
As a general rule: when asked about Java, you could argue that speed and reliability very much rely on your developers understanding of what they're doing. Which, in Java, implies a large margin of error. While we could suppose: if they're asking such questions, they probably have some good devs on their payroll.
Whereas distributed databases (in-memory, on disk, SQL or noSQL), ... is quite a complicated topic, that you would need to master (on top of java), to get it right.
The broad approach they're describing was described by Adya in 2019 as a LInK store. Linked In-memory Key-value stores allow for application objects supporting rich operations to be sharded across a cluster of instances.
I would tend to approach this by implementing a stateful application using Akka (disclaimer: I am at this writing employed by Lightbend, which employs the majority of the developers of Akka and offers support and consulting services to clients using Akka; as my SO history indicates, I would have the same approach even multiple years before I was employed by Lightbend) along these lines.
Akka Cluster to allow a set of JVMs running an application to form a cluster in a peer-to-peer manner and manage/track changes in the membership (including detecting instances which have crashed or are isolated by a network partition)
Akka Cluster Sharding to allow stateful objects keyed by ID to be distributed approximately evenly across a cluster and rebalanced in response to membership changes
These stateful objects are implemented as actors: they can update their state in response to messages and (since they process messages one at a time) without needing elaborate synchronization.
Cluster sharding implies that the actor responsible for an ID might exist on different instances, so that implies some persistence of the state of the zone outside of the cluster. For simplicity*, when an actor responsible for a given zone starts, it initializes itself from datastore (could be S3, could be Dynamo or Cassandra or whatever): after this its state is in memory so reads can be served directly from the actor's state instead of going to an underlying datastore.
By directing all writes through cluster sharding, the in-memory representation is, by definition, kept in sync with the writes. To some extent, we can say that the application is the cache: the backing datastore only exists to allow the cache to survive operational issues (and because it's only in response to issues of that sort that the datastore needs to be read, we can optimize the data store for writes vs. reads).
Cluster sharding relies on a conflict-free replicated data type (CRDT) to broadcast changes in the shard allocation to the nodes of the cluster. This allows, for instance, any instance to handle an HTTP request for any shard: it simply forwards a representation of the important parts of the request as a message to the shard which will distribute it to the correct actor.
From Kubernetes' perspective, the instances are stateless: no StatefulSet or similar is needed. The pods can query the Kubernetes API to find the other pods and attempt to join the cluster.
*: I have a fairly strong prior that event sourcing would be a better persistence approach, but I'll set that aside for now.
By default someone can read the state data using REST API. Is there a way to add read permissions on specific addresses and change them while the network is up.
The short answer to your question is by using a proxy server, the documentation you're referring to in the question mentions it here https://sawtooth.hyperledger.org/docs/core/releases/1.1/sysadmin_guide/rest_auth_proxy.html#using-a-proxy-server-to-authorize-the-rest-api
There may not be an out of the box component that does what you're asking. There's definitely possibility of doing what you're asking for. You can add the logic filtering based on the read address in the proxy server.
More explanation:
If you're considering one Validator instance per organization. Organization participates in a blockchain application use case then all the participants in the network can see the data you store in the state store. It's the responsibility of the participating organizations to restrict the access to their data. Using the proxy server is one such means.
If you're considering adding multiple use cases per organization, participating in different network altogether then it is advisable to have a different Validator instance per those use cases that require isolation. Again, it's the responsibility of each organization to protect the data stored in the network they're participating in.
For the point 2, the Hyperledger Sawtooth 2.0 proposed solution allows you to run multiple instances of the Validator as a service in a single process. That means you can have one physical node (also process) participating in multiple circuits providing isolation.
Before I end the answer for the benefit of others searching for an answer: Blockchain is not just a distributed storage but also a decentralized network. There are number of design patterns that allows us to keep the critical data outside the blockchain network and use the functionalities of the blockchain network (achieving consensus, smart contract verification to be specific) for what it is expected to do.
Generally, we do not have peers concepts in Corda. We always call them as nodes. How to create multiple peers (nodes) for single organization (Party) in Corda?
Technically, you can register as many peers (nodes) for the same organization. As long as, you let your counterparties know that those X509 names belong to your organization.
But, keep in mind that your peers cannot share from the database level, meaning that if peerA received some info from an external party and stored at its vault, PeerB cannot just go into peerA's vault and use it for its transactions.
That is, PeerA and PeerB are entirely different parties on the Corda level. If you want to share some information, you need to do a Flow. This is to protecting provenance and immutability of the ledger.
Looking into the future, our dev team is implementing a new accounts feature. It will be released with Corda Enterprise 5. This feature allows having multiple accounts under the same node. If you would like more information join the slack channel and ping me #http://slack.corda.net
I need to develop an enterprise grade permissioned based blockchain application using Hyperledger fabric. To start with, I would like to understand how I should determine number of nodes required. Basically every organization will have one peer node that will process all the transactions. Apart from this reason, do we need nodes for anything else and how many of them? From architecture perspective, what aspects I need to consider?
Each node can have some of this functions (router is a must):
router
full blockchain
wallet
miner
Since you're speaking about "one node per organization", you probably mean one "full" node per organization, meaning each organization will have a miner node, containing full blockchain locally.
The problem here is this: how can you guarantee each organization in your architecture will have the same mining power?..
How can a malicious activity be controlled in hyperledger fabric ?
Any links will be helpful too.
Fabric is an implementation of the Blockchain. Blockchain guarantees safety because it is a distributed system. The information is replicated in different nodes in the network. So, if you changed one, the other nodes would realize.
On the other hand, a Blockchain is a chain composed by blocks that store transactions. Each block references to the previous one, so if you wanted to modify the data in one block, you should change all the blocks. And this is computationally very difficuatl, you should have more than the 51 % of computing capacity of the network.
You can read more about it here.
You can control through put txt and a potential maximum per port or user per second.
Create a transaction fee which would limit what they are doing if they are taxing the network.
Kick them off the network.