What's preventing the Ethereum blockchain to getting to big too fast - blockchain

So I recently started looking at Solidity on the Ethereum blockchain, and have a question about the size that smart contracts generate.
I'm aware that there is a size limit for the byte code generated by the contract itself, and that it cannot exceed 27kb. Also there's an upper limit for transactions too. However, what I'm curious about is that, since there's no limit on the variables that smart contract stores, what is stopping those variables from get very large in sizes? For popular smart contracts like uniswap, I would imagine they can generate hundreds of thousands of transactions per day and the state they keep would be huge.
If I understand it correctly, basically every node on the chain would store the whole blockchain, so limiting the size of blockchain would be very important. Is there anything done to limit the size of smart contracts, which mainly I think is dominated by the state variables they store.

Is there anything done to limit the size of smart contracts, which mainly I think is dominated by the state variables they store.
No. Ethereum will grow infinitely and currently there is no viable plan to limit state growth besides keeping transaction costs high and block space premium.
You read more about this in my post Scaling EVM here.

TLDR: The block size limit.
The protocol has a hardcoded limit that prevents the blockchain from growing too fast.
Full Answer
Growth Speed
The protocol measures storage (and computation) in a unit called gas. Each transaction consumes more or less gas depending on what its doing, such that an ether transfer costs 21k gas, but a uniswap v2 swap consumes around 100k gas. Deploying big contracts consume more.
The current gas limit is 30 million per block, so the actual number of transactions varies even if the blocks are always full (some consume more than others).
FYI.: This is why transactions per second is a BS, marketing metric in blockchains with rich smart contracts.
Deeper Dive
Storage as of June 2022
The Ethereum blockchain is currently ~180 GB in size. This is the stuff that is critical to the existence and from which absolutely every thing else is calculated.
Péter Szilágyi is the lead developer of the oldest, flagship ethereum node implementation
That being said, nodes generate a lot of data while processing the blockchain to generate the current state (i.e. how much money do you have on your wallet now).
Today, if you want to run a node that stores every single block and transaction starting from genesis (or what bitcoin, but not ethereum engineers, call an archive node) you currently need around 580 Gb (this grows as the node runs). See Etherscan's geth after they deleting some locally generated data, June 26, 2022.
If you want to run what ethereum engineers call an archive node - a node that not only keeps all blocks from genesis but also does not delete generated data - then you currently need 1.5 TB of storage using erigon.
Older clients that do not use the flat key-value storage, generate considerably more data (in the order of 10TB).
The Future
There are a lot of proposals, research and active development efforts working in parallel and so this part of the answer might become outdated. Here are some of them:
Sharding: Ethereum will split data (but not execution) into multiple shards, without losing confidence that the entirety of it is available via Data Availability Sampling;
Layer 2 Technologies: These move gas that was consumed by computation to another layer, without losing guarantees of the first layer such as censorship resistance and security. The two most promising instances of this (on Ethereum) are optimistic and zero-knowledge rollups.
State Expiry: Registers, Cache, RAM, SSD, HDD, Tape libraries are storage solutions, ordered by from fastest, most expensive to slowest, cheapest. Ethereum will follow the same strategy: move state data that is not accessed often in cheaper places;
Verkle Trees;
Portal network;
State Rent;
Bitcoin's Lightning network is the first blockchain layer 2 technology.

Related

Understanding why AWS Elasticsearch GC (young and old) keeps on rising while memory pressure is not

I am trying to understand if I have an issue with my AWS Elasticsearch garbage collection time but all the memory-related issues that I find are relating to memory pressure which seems OKish.
So while I run a load test on the environment, I observe a constant rise in all GC collection time metrics, for example:
But when looking at memory pressure, I see that I am not passing the 75% mark (but getting near..) which, according to documentation, will trigger a concurrent mark & sweep.
So I fear that once I add more load or run a longer test, I might start seeing real issues which will have an impact on my environment. So, do I have an issue here? how should I approach rising GC time when I cant take memory dumps and see what's going on?
The top graph reports aggregate GC collection time, which is what's available from GarbageCollectorMXBean. It continues to increase because every young generation collection adds to it. And in the bottom graph, you can see lots of young generation collections happening.
Young generation collections are expected in any web-app (which is what an OpenSearch cluster is): you're constantly making requests (queries or updates), and the those requests create garbage.
I recommend looking at the major collection statistics. In my experience with OpenSearch, these happen when you're performing large numbers of updates, perhaps as a result of coalescing indexes. However, they should be infrequent unless you're constantly updating your cluster.
If you do experience memory pressure, the only real solution is to move to a larger node size. Adding nodes probably won't help, due to the way that indexes are sharded across nodes.
I sent a query to AWS technical support and, counter to any intuitive behavior, the values of the Young and Old Collection time and count in Elasticsearch is cumulative. This means that this value keeps increasing and does not drop down to a value of 0 until there is a node drop or node restart

Can you run out of storage space while running a blockchain node?

Maybe a dumb question, and perhaps not one for stack overflow, but from what I understand running a full node requires locally storing the entire ledger for the network you're running. Correct? What happens if the size of the ledger increases beyond a users storage size though?
For example, if a user has 10gb of storage space, and the size of the ledger exceeds beyond 10gb? Does that node just die? Is the circulating supply in place to solve this problem? How about for networks with infinite supply
You get an insufficient disk space error and the node cannot keep the ledger up to date and will fall behind. Depending on the implementation in your node, the policies of the chain, etc. different things can happen. In some staking chains, your node may be considered to be unable to validate new transactions and in unreliable. This can eat in to your staked value depending on what the policies are.
Circulating supply doesn't mean much to storage, but length of history, contained data, ability to prune, etc. all have large effects. For instance, you can run a full BSV node, but aggressively prune it while still being able to validate the chain. The reason why this is important is because the block size limits being removed leaves node operators with an untenable cost. Ripple's full history was over 14 Tb recently, and isn't a recommended thing to do for anyone, even enterprise.

How do clients of a distributed blockchain know about consensus?

I have a basic blockchain I wrote to explore and learn more about the technology. The only real world experience I have with them is in a one-to-one transaction from client to server, as a record of transactions. I'm interested in distributed blockchains now.
In its simplest, most theoretical form, how is consensus managed? How do peers know to begin writing transactions on the next block? You have to know when >50% of the entire pool has accepted some last block written. But p2p systems can be essentially unbounded, and you can't trust a third party to handle surety, so how is this accomplished?
edit: I now know roughly how bitcoin handles consensus:
The consensus determines the accepted blockchain. The typical rule of "longest valid chain first" ensures that only one variant is accepted. People may accept a blockchain after any number of confirmations, typically 6 is sufficient to ensure a clear winner.
However, this seems like a slow and least-deliberate method. It ensures that there is a certain amount of wasted work on the part of nodes that happen to be in a part of the network that had a local valid solution at roughly the same time as a generally accepted solution.
Are there better alternatives?
Interesting question. I would say the blockchain technology solves only probabilistic consensus. With a certain confidence, the blockchain-network agrees on something.
Viewing blockchain as a distributed system we can say that the state of blockchain is distributed: the blockchain is kept as a whole but there are many distributed replicas of local copies. More interestingly, the operations are distributed: Writes or reads can happen at different nodes concurrently. Read operations can be done locally at the local copy of the blockchain, but this read can of course be stale if your local copy is not up-to-date, however there is always an incentive for nodes in the blockchain network to keep their local copy up-to-date so that they can complete new transactions when necessary.
Write operations is the tricky part here, that blockchain must solve. As writes happen concurrently in a distributed fashion, blockchain must ensure to avoid inconsistencies such as double spending and somehow reach consensus on the current state. The way blockchain does this is probabilistic, first of all they made it expensive to write to the chain by adding the "puzzle" to be solved, reducing the probability that different distributed writes happen concurrently, but they can still happen, but with lower probability. In addition, as there is an incentive for nodes in the network to keep their state up to date, nodes that received the flooded write operation will validate it and accept that operation into their chain. I think the incentive to always keep the chain up-to-date is key here because that ensures that the chain will make progress. I.e a writer has a clear incentive to keep its chain up-to-date since it will be competing with the "longest-chain-first" principle against other concurrent writers. For non-adversarial miners there is also an incentive to interrupt the current mining, accept a new write-block and restart the mining process, ensuring a sort of liveness in the system.
So blockchain relies on probabilistic consensus, what is the probability then? The probability that two exactly equal branches growing in parallel at the same time is close to 0 assuming that there are not any large group of adversarial nodes taking over the network. With very high probability one branch will be longer than the other and be accepted and the network reach consensus on that branch and write operations in the shorter branch have to be re-tried. The big concern is of course big adversarial miner groups who might deliberately try to create forks in the blockchain to perform double spending attacks.. but that is only likely to succeed if they get close to 50% of the computational power in the network.
So to conclude: natural branching in blockchain that can happen due to probabilistic reasons of concurrent writes (probability reduced due to the puzzle-solving) will with almost 100% probability converge to a single branch as write operations continue to happen, and the network reaches consensus on a single branch.
However, this seems like a slow and least-deliberate method. It
ensures that there is a certain amount of wasted work on the part of
nodes that happen to be in a part of the network that had a local
valid solution at roughly the same time as a generally accepted
solution.
Are there better alternatives?
Not that I can think of, there would be many more efficient solutions if all peers in the system "were under control" and you could make them follow some protocol and perhaps have a designated leader to tell the order of writes and ensure consensus, but that is not possible in a decentralized open system.
In the permissioned blockchain environment, where the participants are known in advance, client can get cryptographic proof of the consensus (e.g. that it was signed at least by 2/3 of the participants) and to verify it. Usually it can be achieved using threshold signatures.
In the public blockchains, AFAIK, there is no way to do this since the number of participants is unknown/changes all the time.

How many 9s is durable an S3 object replicated in 'n' Regions/Buckets

S3 documentation states that an S3 object durability is 99.999999999 (11 nines) for a year. How many 9s an object is durable if it is replicated/copied over 'n' regions/buckets.
This question started me wondering... how do you put a number like this on durability? How does S3 come up with 11 9's of durability and why is the durability of the old Reduced Redundancy Storage (RRS) class apparently so much lower, at only 99.99% (4 9's), even though it's still stored in 2 AZs, not 3.
The answer appears to lie in the statistical odds of the annual failure rate (AFR) of each individual storage entity (which might be a hard drive, but given the fact that commodity hard drives have a statistically higher failure rate -- perhaps as high as 4% AFR -- a "storage device" might be a RAID array, or other cluster technology where each independent storage entity has a 1% AFR. I'll refer to this entity as a "storage device" for simplicity. My intention is not to claim that S3 uses n hard drives to store objects; that is almost certainly an oversimplification, and I have no insight into the inner workings of S3).
Let's briefly assume, for illustration purposes, that the AFR of a storage device in a well-maintained fleet is 1%. Obviously, this assumes the physical drives are removed from service before they reach an excessive age, otherwise they would of course all fail, eventually.
Running with the assertion that the likelihood of losing a storage device is 1/100, the odds against it failing in a given year are 99%. We can then call the device's contents 99% durable, annually.
If we have the same data stored on two such devices, and the system is designed such that the failure of both devices is unlikely to have any correlatable cause (e.g., not only are they not in the same cabinet or on the same power supply, they're not even in the same building), we can say concurrent failures are statistically independent, and we can determine the likelihood of losing both devices concurrently (resulting in the loss of the contents) by multiplying the probabilities together: 0.01 × 0.01 = 0.0001 or 0.01%. Thus with the same content on both drives, the odds against losing both of them improves to 99.99%.
We can extrapolate this out to a number of storage devices:
1 0.010000000000 99%
2 0.000100000000 99.99%
3 0.000001000000 99.9999%
4 0.000000010000 99.999999%
5 0.000000000100 99.99999999%
6 0.000000000001 99.9999999999%
Curiously, we arrive at numbers very similar to the published specs of S3, which we know stores objects redundantly across 3 availability zones. If we assume "redundantly" means two storage devices in each of these zones, then we arrive very close to 11 9's of durability (it's actually slightly higher).
Reduced Redundancy Storage stores objects replicated fewer times ans and in only 2 availability zones, and we find the statistical failure rate of 2 devices does predict a durability of 99.99%.
All of this is is to try to establish what "durability" really means with regard to stored objects, and it certainly seems to refer to the odds against every copy of the object being lost.
By extension, replicating an object to a second AWS region means we need to multiply the infinitesimally small odds together, which increases the statistical durability by an additional ~11 9's (22 9's), because the failure of 12 independent storage devices in 6 availability zones across 2 different regions should be absolutely uncorrelatable, and so unlikely as to never be a possibility.
The problem, of course, is that at these small numbers, the odds of something else going wrong, unrelated to pure durability -- like an administrative error, a malicious event, or even a defect in S3 -- would seem to become more likely by comparison... but replication across regions may help guard against these things as well. Object versioning is also an excellent feature for helping prevent data loss, since certain kinds of inadvertent errors become less likely to occur.
From Amazon S3 FAQ:
Q: How durable is Amazon S3?
Amazon S3 Standard and Standard - IA are designed to provide 99.999999999% durability of objects over a given year. This durability level corresponds to an average annual expected loss of 0.000000001% of objects. For example, if you store 10,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000,000 years. In addition, Amazon S3 is designed to sustain the concurrent loss of data in two facilities.
Each object is replicated "behind-the-scenes" to two additional data centers. Each data center is physically separate, with seperate or redundant facilities (network, power, etc).
If you are not satisfied with 11x9s of durability, you could use cross-region replication to copy objects to a bucket in a different region, which would again be replicated across three data centers (making 6 copies in total).
The durability would then be 1-(11x9s x 11x9s), which is the chance of all six facilities failing. If that happened, you would have worse things to worry about in your life than your data being lost (eg global thermonuclear war).

Which Key value, Nosql database can ensure no data loss in case of a power failure?

At present, we are using Redis as an in-memory, fast cache. It is working well. The problem is, once Redis is restarted, we need to re-populate it by fetching data from our persistent store. This overloads our persistent
store beyond its capacity and hence the recovery takes a long time.
We looked at Redis persistence options. The best option (without compromising performance) is to use AOF with 'appendfsync everysec'. But with this option, we can loose last second data. That is not acceptable. Using AOF with 'appednfsync always' has a considerable performance penalty.
So we are evaluating single node Aerospike. Does it guarantee no data loss in case of power failures? i.e. In response to a write operation, once Aerospike sends success to the client, the data should never be lost, even if I pull the power cable of the server machine. As I mentioned above, I believe Redis can give this guarantee with the 'appednfsync always' option. But we are not considering it as it has the considerable performance penalty.
If Aerospike can do it, I would want to understand in detail how persistence works in Aerospike. Please share some resources explaining the same.
We are not looking for a distributed system as strong consistency is a must for us. The data should not be lost in node failures or split brain scenarios.
If not aerospike, can you point me to another tool that can help achieve this?
This is not a database problem, it's a hardware and risk problem.
All databases (that have persistence) work the same way, some write the data directly to the physical disk while others tell the operating system to write it. The only way to ensure that every write is safe is to wait until the disk confirms the data is written.
There is no way around this and, as you've seen, it greatly decreases throughput. This is why databases use a memory buffer and write batches of data from the buffer to disk in short intervals. However, this means that there's a small risk that a machine issue (power, disk failure, etc) happening after the data is written to the buffer but before it's written to the disk will cause data loss.
On a single server, you can buy protection through multiple power supplies, battery backup, and other safeguards, but this gets tricky and expensive very quickly. This is why distributed architectures are so common today for both availability and redundancy. Distributed systems do not mean you lose consistency, rather they can help to ensure it by protecting your data.
The easiest way to solve your problem is to use a database that allows for replication so that every write goes to at least 2 different machines. This way, one machine losing power won't affect the write going to the other machine and your data is still safe.
You will still need to protect against a power outage at a higher level that can affect all the servers (like your entire data center losing power) but you can solve this by distributing across more boundaries. It all depends on what amount of risk is acceptable to you.
Between tweaking the disk-write intervals in your database and using a proper distributed architecture, you can get the consistency and performance requirements you need.
I work for Aerospike. You can choose to have your namespace stored in memory, on disk or in memory with disk persistence. In all of these scenarios we perform favourably in comparison to Redis in real world benchmarks.
Considering storage on disk when a write happens it hits a buffer before being flushed to disk. The ack does not go back to the client until that buffer has been successfully written to. It is plausible that if you yank the power cable before the buffer flushes, in a single node cluster the write might have been acked to the client and subsequently lost.
The answer is to have more than one node in the cluster and a replication-factor >= 2. The write then goes to the buffer on the client and the replica and has to succeed on both before being acked to the client as successful. If the power is pulled from one node, a copy would still exist on the other node and no data would be lost.
So, yes, it is possible to make Aerospike as resilient as it is reasonably possible to be at low cost with minimal latencies. The best thing to do is to download the community edition and see what you think. I suspect you will like it.
I believe aerospike would serves your purpose, you can configure it for hybrid storage at namespace(i.e. DB) level in aerospike.conf
which is present at /etc/aerospike/aerospike.conf
For details please refer official documentation here: http://www.aerospike.com/docs/operations/configure/namespace/storage/
I believe you're going to be at the mercy of the latency of whatever the storage medium is, or the latency of the network fabric in the case of cluster, regardless of what DBMS technology you use, if you must have a guarantee that the data won't be lost. (N.B. Ben Bates' solution won't work if there is a possibility that the whole physical plant loses power, i.e. both nodes lose power. But, I would think an inexpensive UPS would substantially, if not completely, mitigate that concern.) And those latencies are going to cause a dramatic insert/update/delete performance drop compared to a standalone in-memory database instance.
Another option to consider is to use NVDIMM storage for either the in-memory database or for the write-ahead transaction log used to recover from. It will have the absolute lowest latency (comparable to conventional DRAM). And, if your in-memory database will fit in the available NVDIMM memory, you'll have the fastest recovery possible (no need to replay from a transaction log) and comparable performance to the original IMDB performance because you're back to a single write versus 2+ writes for adding a write-ahead log and/or replicating to another node in a cluster. But, your in-memory database system has to be able to support direct recovery of an in-memory database (not just from a transaction log). But, again, two requirements for this to be an option:
1. The entire database must fit in the NVDIMM memory
2. The database system has to be able to support recovery of the database directly after system restart, without a transaction log.
More in this white paper http://www.odbms.org/wp-content/uploads/2014/06/IMDS-NVDIMM-paper.pdf