AFAIK after block validation node runs all transactions in the block, changing the state (list of UTXOs)
Let's imagine that at some point node realizes that it was on the wrong chain and there is longer chain available, which forked some blocks before.
How does it make the switch? I imagine that it should run all transactions in reverse till the fork happened to restore the state and than replay all transactions in the blocks from the longer chain?
Thanks!
Each node receives individual transactions as well as individual blocks from the network.
It also keeps the most updated blockchain locally.
For every new transaction it receives, the node validates it, and if valid, propagates to its peers.
For every block the node receives, it validates it. The validation includes several steps, among which:
1. checking that the block points to the most recent block in the blockchain (it's preceding block)
2. all transactions included in the block are valid.
A fork is a temporary situation, possible when there are 2 valid blocks (or more) which arrive to a node pretty much at the same time, so the node doesn't know which is the right one. It keeps the first one added to it's local blockchain as the main chain, and keeps the second one as a fork chain (also locally), until a next node arrives, and is added to one of the two. When it happens - the longer chain is chosen to be the main blockchain (at that node!), and the second is kept as a side chain.
All such side chains are kept in the node's memory for some time, until it can be sure they are not relevant anymore (since they are shorter than the main blockchain by several blocks), and then removed.
I don't know why you have the picture that anything has to be "rolled back". Yes, it's rolling back, but no calculations with transactions have to be done at all. Here's why:
When node A has N+5 blocks and node B has N+2 block, then all node B has to do is drop these additional two blocks and take the 5 new blocks from A.
That's all! Yes, it effectively is rolling back, but nothing has to be run in reverse, because dropping blocks is effectively equivalent to reversing transactions.
Remember that transactions are directed, so they happen only in one direction in time. Meaning: For valid blocks, every non-coinbase transaction in block number N has to have some history in every previous blocks, so block number N is dependent on that history, but the opposite is not true. Previous blocks don't depend on block number N, so dropping N won't invalidate them.
Related
Our blockchain consists of a range of blocks that are chained together with a linked list, going only backward, where each block header has a prevHash (bitcoin) or parentHash (ethereum).
Here's my question: If I can't go forward, how do I know I'm on the latest block?
Do I ask other peers in the network what block they are on?
And, theoretically, if I'm the only one on the network, what happens then?
If I can't go forward, how do I know I'm on the latest block?
Each miner, after they created a block, broadcasts the new block (including its number) to the network.
So each node (both miners and non-miners) gets the latest block from this broadcasted message.
Plus, you can also ask other peers directly about their current block to validate if you're on the current block as well.
Note: Sometimes multiple miners create and broadcast a valid block with the same number. Only one of them is accepted by the network (usually the one with the lowest timestamp) and the others are discarded as "uncle blocks".
And, theoretically, if I'm the only one on the network, what happens then?
Then you're the only miner as well, so you also have the last block number from the block that you mined.
After reading Grokking Bitcoin I now have a broad idea of how Bitcoin works, but I still have a doubt that, generally, how a blockchain-based system can guarantee the immutability of a random block? I know every block has a stored hash value of the content of the previous block plus some nonce.
Let's say this blockchain(for simplicity, we use a linked-list style rather than a Merkle tree style) has 1000 blocks, and a hacker just changed the content inside of the 10-th block. Of course, if we recompute the hash of this 10-th block and compare it with the hash stored inside of the 11th block, it will be different, most probably.
My question is, should a blockchain-based system periodically check the hash inside of every block to detect if the content of a block is changed? In this case, if the system does not have a function to periodically recompute the hash of the 10-th block, it will not be able to detect the change, right? In other words, my question is how a blockchain-based system detects the change of a block?
Thanks
When you change the content of the 10th block you have to find the hash to meet the difficulty for that block. This is called "mining" and it takes time/energy. When you finally found the hash value for that block you can propagate your new block to the blockchain to all the other nodes.
However, they will ignore this block which would create a new branch of the blockchain at the 9th block because it is 991 blocks behind the current blockchain (that's about 7 days behind in the blockchain). The miners will work only on the longest chain for new blocks, not on the most recent received chain. So, your new 10th block will not be saved or used at all. Your effort in calculating the hash is wasted. Furthermore, the original 10th block is still inside the blockchain known by every other node and it has not been changed. You have just mined a different 10th block no other node cares about.
What I understand about blockchain is that:
Blocks are secured by the hash.
Transactions are secured by the markle-tree.
Does this mean that the markle-tree is not involved at all in securing the blocks?
If so, what prevents us from changing the transactions if we know the hash of older blocks in the chain?
Please note that I'm assuming that we are using a blockchain with only one node. And I want to know how hard it is to hack the blockchain in one node. Because as far as i understand, the hashing alone is very secure, but distributing the blockchain on multiple nodes will make it even more secure.
Blocks are secured with the proof of work. The proof of work is a measure related to how many hashes (on average) it would take to get a block hash equal to the network target value. The lower the target value, the more work was done on the block, and the harder it is to change or "hack" the data in the block and still remain a valid block (because you must do the work again).
The merkle root is just a way to represent all of the transactions in the block in a single hash value, which is part of the data that is hashed to produce the block hash. If you change any of the transaction data, it will produce a different merkle root, and that will make the resulting block hash different too, and now the proof of work must be done again before the block would be considered valid.
Now, with only one node, it does not matter. If you are able to change the data in a block and rehash that block with a new valid hash (one that is equal to or lower than the network target value), you have a new block, but the node will reject it because it already has that block. You must mine the next block also before anyone else because one of the consensus rules is that the longest valid chain always wins.
Having only one node running means that node can be changed by the person who is running it, possibly without anyone else knowing. This might remove certain rules that you thought we're being followed, which could reverse one of your transactions, so it is good to run your own node to make sure the rules are being followed.
Hi imagine I have such code:
0. void someFunction()
1. {
2. ...
3. if(x>5)
4. doSmth();
5.
6. writeDataToCard(handle, data1);
7.
8. writeDataToCard(handle, data2);
9.
10. incrementDataOnCard(handle, data);
11. }
The thing is following. If step 6 & 8 gets executed, and then someone say removes the card - then operation 10 will not be completed successfully. But this will be a bug in my system. Meaning if 6 & 8 are executed then 10 MUST also be executed. How to deal with such situations?
Quick Summary: What I mean is say after step 8 someone may remove my physical card, which means that step 10 will never be reached, and that will cause a problem in my system. Namely card will be initialized with incomplete data.
You will have to create some kind of protcol, for instance you write to the card a list of operatons to complete:
Step6, Step8, Step10
and as you complete the tasks you remove the entry from the list.
When you reread the data from the disk, you check the list if any entry remains. If it does, the operation did not successfully complete before and you restore a previous state.
Unless you can somehow physically prevent the user from removing the card, there is no other way.
If the transaction is interrupted then the card is in the fault state. You have three options:
Do nothing. The card is in fault state, and it will remain there. Advise users not to play with the card. Card can be eligible for complete clean or format.
Roll back the transaction the next time the card becomes available. You need enough information on the card and/or some central repository to perform the rollback.
Complete the transaction the next time the card becomes available. You need enough information on the card and/or some central repository to perform the completion.
In all three cases you need to have a flag on the card denoting a transaction in progress.
More details are required in order to answer this.
However, making some assumption, I will suggest two possible solutions (more are possible...).
I assume the write operations are persistent - hence data written to the card is still there after card is removed-reinserted, and that you are referring to the coherency of the data on the card - not the state of the program performing the function calls.
Also assumed is that the increment method, increments the data already written, and the system must have this operation done in order to guarantee consistency:
For each record written, maintain another data element (on the card) that indicates the record's state. This state will be initialized to something (say "WRITING" state) before performing the writeData operation. This state is then set to "WRITTEN" after the incrementData operation is (successfully!) performed.
When reading from the card - you first check this state and ignore (or delete) the record if its not WRITTEN.
Another option will be to maintain two (persistent) counters on the card: one counting the number of records that began writing, the other counts the number of records that ended writing.
You increment the first before performing the write, and then increment the second after (successfully) performing the incrementData call.
When later reading from the card, you can easily check if a record is indeed valid, or need to be discarded.
This option is valid if the written records are somehow ordered or indexed, so you can see which and how many records are valid just by checking the counter. It has the advantage of requiring only two counters for any number of records (compared to 1 state for EACH record in option 1.)
On the host (software) side you then need to check that the card is available prior to beginning the write (don't write if its not there). If after the incrementData op you you detect that the card was removed, you need to be sure to tidy up things (remove unfinished records, update the counters) either once you detect that the card is reinserted, or before doing another write. For this you'll need to maintain state information on the software side.
Again, the type of solution (out of many more) depends on the exact system and requirements.
Isn't that just:
Copy data to temporary_data.
Write to temporary_data.
Increment temporary_data.
Rename data to old_data.
Rename temporary_data to data.
Delete the old_data.
You will still have a race condition (if a lucky user removes the card) at the two rename steps, but you might restore the data or temporary_data.
You haven't said what you're incrementing (or why), or how your data is structured (presumably there is some relationship between whatever you're writing with writeDataToCard and whatever you're incrementing).
So, while there may be clever techniques specific to your data, we don't have enough to go on. Here are the obvious general-purpose techniques instead:
the simplest thing that could possibly work - full-card commit-or-rollback
Keep two copies of all the data, the good one and the dirty one. A single byte at the lowest address is sufficient to say which is the current good one (it's essentially an index into an array of size 2).
Write your new data into the dirty area, and when that's done, update the index byte (so swapping clean & dirty).
Either the index is updated and your new data is all good, or the card is pulled out and the previous clean copy is still active.
Pro - it's very simple
Con - you're wasting exactly half your storage space, and you need to write a complete new copy to the dirty area when you change anything. You haven't given enough information to decide whether this is a problem for you.
... now using less space ... - commit-or-rollback smaller subsets
if you can't waste 50% of your storage, split your data into independent chunks, and version each of those independently. Now you only need enough space to duplicate your largest single chunk, but instead of a simple index you need an offset or pointer for each chunk.
Pro - still fairly simple
Con - you can't handle dependencies between chunks, they have to be isolated
journalling
As per RedX's answer, this is used by a lot of filesystems to maintain integrity.
Pro - it's a solid technique, and you can find documentation and reference implementations for existing filesystems
Con - you just wrote a modern filesystem. Is this really what you wanted?
I've written a 'server' program that writes to shared memory, and a client program that reads from the memory. The server has different 'channels' that it can be writing to, which are just different linked lists that it's appending items too. The client is interested in some of the linked lists, and wants to read every node that's added to those lists as it comes in, with the minimum latency possible.
I have 2 approaches for the client:
For each linked list, the client keeps a 'bookmark' pointer to keep its place within the linked list. It round robins the linked lists, iterating through all of them over and over (it loops forever), moving each bookmark one node forward each time if it can. Whether it can is determined by the value of a 'next' member of the node. If it's non-null, then jumping to the next node is safe (the server switches it from null to non-null atomically). This approach works OK, but if there are a lot of lists to iterate over, and only a few of them are receiving updates, the latency gets bad.
The server gives each list a unique ID. Each time the server appends an item to a list, it also appends the ID number of the list to a master 'update list'. The client only keeps one bookmark, a bookmark into the update list. It endlessly checks if the bookmark's next pointer is non-null ( while(node->next_ == NULL) {} ), if so moves ahead, reads the ID given, and then processes the new node on the linked list that has that ID. This, in theory, should handle large numbers of lists much better, because the client doesn't have to iterate over all of them each time.
When I benchmarked the latency of both approaches (using gettimeofday), to my surprise #2 was terrible. The first approach, for a small number of linked lists, would often be under 20us of latency. The second approach would have small spats of low latencies but often be between 4,000-7,000us!
Through inserting gettimeofday's here and there, I've determined that all of the added latency in approach #2 is spent in the loop repeatedly checking if the next pointer is non-null. This is puzzling to me; it's as if the change in one process is taking longer to 'publish' to the second process with the second approach. I assume there's some sort of cache interaction going on I don't understand. What's going on?
Update: Originally, approach #2 used a condition variable, so that if node->next_ == NULL it would wait on the condition, and the server would notify on the condition everytime it issued an update. The latency was the same, and in trying to figure out why I reduced the code down to the approach above. I'm running on a multicore machine, so one process spinlocking shouldn't affect the other.
Update 2: node->next_ is volatile.
Since it sounds like reads and writes are occurring on separate CPUs, perhaps a memory barrier would help? Your writes may not be occurring when you expect them to be.
You are doing a Spin Lock in #2, which is generally not such a great idea, and is chewing up cycles.
Have you tried adding a yield after each failed polling-attempt in your second approach? Just a guess, but it may reduce the power-looping.
With Boost.Thread this would look like this:
while(node->next_ == NULL) {
boost::this_thread::yield( );
}