Bitcoin Merkle Tree Verification / Development - blockchain

Recently, I am doing research on bitcoin since it is fascinating. I came up with a few questions related to it that I would highly appreciate if someone can answer.
I do not understand how transactions can be verified by just using the Merkle Root. The block headers only contain the Merkle root, but to verify if the transactions in the block are valid, you still have to hash all of the transactions and compare it with the Merkle root. Am I missing something?
It seems like bitcoin source code can be updated: https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md. If this is the case, how can we say that the bitcoin is permanent decentralized store of value? We don't know how the system will change in the future. Also, who is in charge of developing the bitcoin? If there is institution in charge of this, how can we say it is fully decentralized?
Thanks!

The transaction identifier a.k.a. txid is generated by twice hashing (SHA-256) the serialized transaction data. The merkle root is the result of a merkle tree which takes as inputs all txids in the block. To paraphrase this masterpiece website (learnmeabitcoin):
"[Merkle root] gives you a short yet unique fingerprint for all the transactions in a block".
Yes, the bitcoin source code can and has been updated for years. In Bitcoin history, there are plenty groups which has modified (forks) bitcoin source code and run another version of the bitcoin repository.
Bitcoin forks are listed, some of them still running. We don't know how the system will change in the future, but if you don't like these changes, you are free to fork your own and aggregate a community around it. The community is crucial, the contributors (miners, developers, writers …) and users give strength to the project, as long as there is a community around bitcoin, it can exist.

Related

Is it impossible to verify that a website is using their publicly released source code in production?

This is a bit of an oddball question, and I've seen nothing similar asked anywhere on the internet.
What I want to do
I want to release the source code of my website. Beyond that, the nature of the website (described below) is such that I not only want to release the source code, but I want users of my website to be able to unequivocally verify that they're using the exact version of the website as is in the source code dump.
The part that makes it tough is that my good will cannot be trusted. (Obviously, it can since I'm going to this length, but from the user's perspective it cannot).
My (mental exercises) to try to fix this
Attempt 1
So, the first thing I thought about was hashing the source code, or even hashing the entire docker container it's running in, and providing an endpoint that broadcasts that hash, so that it may be matched up against the public source code.
Attempt 2
The second thing I thought about was providing users with a read-only shell login so that they would be able to hash the docker image that's being run. The problem here is, there's no way to verify that the docker image is what is running (to my knowlege). I could just build an image of the public source code and put it there for users to hash.
Also, because of the security concern, I really hate the idea of putting users that close to production.
Attempt 3
Finally, I wondered if it would solve the problem if I used some type of blockchain technology, like a distributed app. But that's so complex, and I don't think it provides extra trust.
Why I want to do this
I am building a website that will be handling incredibly personal data. It could destroy people's lives if any of it was leaked (and no, the TLD isn't .xxx or anything like that. In fact, nothing illegal is going on with this data). However, there is an intense social stigma associated with the type of data, and it actually is enough evidence (in some countries) to peruse the death penalty against a user, if any data is leaked.
So, in addition to having a very explicit (and secure) Privacy Policy, I want to make an open source promise to my users, so that problems can be hunted down by volunteers and quickly eliminated. Also so that they're able to verify that I'm not adding code into the running production version to enable extra spying on them.
Is this type of thing theoretically possible?
As long as the server receives some of this incredibly personal user data in a format that it/you can read it, there is no perfect way. You would have to let the users encrypt the data before uploading it, with a key only they know.
If there is no data transfer involved, then the users can check the code in the browser and compare hashes manually. Perhaps there is an automatic way of doing this. Anyway all work on the data then has to be performed on the client side.
The core problem is that a user has to trust the complete environment, from source code over compiler to executing OS and hardware.
You cannot cryptographically ensure that you do not for example intercept the running program on a low layer and read out data there, even if you had a possibility to ensure that you run the exact code on the server.
"Trust", you can build by legalities and Privacy Policies.
We all willingly keep our sensitive data online in many applications/systems and we never check to see if the source code or the architecture of the system is the same as what they promised at the beginning.
As #tystackoverflow said, building a backdoor in the code is not the only way you can ensure the users of a system that their data is not accessible to anyone else. In this case your system architecture should also support encryption of data at higher level that no one (unless accessed through the system) can have direct access to it.
I do understand the risk of these data leaking,
It all depends on how you design the System to be tamper proof and secure, and conveying the idea behind the security measures you have built in to the Application to its users.
Good Luck with the Project !

How is the blockchain system(for instance ethereum) if i can change the code of the node and run it usually?

Lets take ethereum for instance(this question is general for all the public blockchains). Can i change the code (like sending transactions but not deducing account balance) and just run it as usual. Is there any underlying mechanism that prevents this? I mean how is the system secure if i could run a modified version of it?
As for your first question unless you fork and modify the ethereum code you can't make the changes you demand (send transactions without deducting balance). That pretty much answers your last question too, you can find all help you need to deploy code or get the origial source code here. Feel free to modify it.https://github.com/ethereum
Also the working or Ethereum is not as simple as you think. You can't just write code (smart contracts) and deploy them in the Ethereum-Blockchain and expect them to run for free. You need ether for that which acts as a fuel for ether. To understand how security of Ethereum works you need to first understand the security of blockchain. I wouldn't go into the details but keep note that as you go further back into the chain it gets more and more difficult to modify the codes. Also Ethereum is quite secure, at least as much as you can expect from blockchain although there are security holes that seem to pop up every now and then, but after 50M hack back in 2016 and the following hard-fork Ethereum is pretty secure from average hackers.
So to sum it up, to modify the code you need to make a fork. And this fork is not at all related to the main Ethereum. SO you aren't running a modified version of Ethereum on the chain itself and hence it remains secure. Also you are quite limited when making smart contracts and they don't allow direct modification of the block-chain.
Say, you run an unmodified ethereum node, and you get 3 ethers from somewhere in your account. Now you rebuild your own modified node so that you send 2 ethers to me, but does not reduce 2 from your balance.
What happens is that me and the rest of the world will verify and process the transaction that you signed and reduce 2 ethers from your balance. We don't run the modified version, right?
You see what's going on? The next time you try to spend another 2 ethers, your node will think you still have 3 ethers and will sign and broadcast a transaction, but everyone else in the world will not accept your transaction, since they know you only have 1 ether.
So you harm just to yourself.

Is Blockchain a distributed database? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
most of the articles cited Blockchain as a distributed database. Does it mean we can store any type of data in blockchain like audio, video ,pdf?
Think of blockchain as a relatively slow, very expensive database that provides excellent resistance to hacking and corruption. It's a Write-Once, Read Mostly (WORM) system.
You absolutely could store any data you want in a hypothetical blockchain. The practical limits are, you don't want to store very large chunks of data (so, not video); you probably don't want to store frequently changing data (so, not a thesis paper you're revising) -- unless it's important somehow to record every single change forever.
Because, its other feature is, once something is written to a blockchain, it's there forever.
Need to fix a typo? Then you add a new record with a correction.
Need to delete a record? Too bad, you can't. Best you can do is enter a new record saying that the record you wish to delete is "obsolete" or "repudiated" or "no longer valid" or "should be considered as deleted."
In short, it's wise to treat your blockchain as a permanent record.
1 Slow: the Bitcoin blockchain runs about 3 transactions per second (tps) and the Ethereum blockchain runs about 30 tps.
2 Expensive: the Bitcoin blockchain cost an average of US$ 8.22 per transaction in November 2017 according to Digiconomist.
Look at what type of data storage, if it is some string, json object, you can expand the structure of the book structure chain store; If the picture, video, large files; You can hash the value of the block chain, the original file using cloud storage
If you asked the question referring "blockchain is a distributed database" as the statement used while explaining about blockchain in blogs and video tutorials, providing further clarification for your understanding:
1. blockchain is not a distributed database technology if you are comparing it with other rdbms/nosql database.
2. bockchain is some how distributed database if you consider it has distributed nodes in the network and all have consistent copy of ledgers, distributed ledgers which are maintained in any kind of database technology and also leverage cryptography to provide a decentralized multi-version concurrency control and maintain consensus about the existence.
Refer the link for further explanation, where you find explanation about it as a distributed database and other similar stuff.
It's probably better to think of the blockchain as a distributed ledger i.e., ledger data that is shared among a number of actors. The reason the DB analogy doesn't work is addressed by one of the other answers: all changes have to be adds/amendments as the ledger itself is immutable. Any database that can't modify data is hobbled to say the least, however, the blockchain is more about an unchanging historical record than it is about storing data for manipulation.
You can put whatever data you want onto the blockchain but considering how data is added to the blockchain and the fact that ALL CHANGES are recorded the smaller the data the better.
The first version of blockchain applied in bitcoin. The main idea behind blockchain is to be decentralized. It consists of blocks. Each block contains information about previous node and the current node. Whatever information (like audio, video, pdf) has to be hashed(digital signature).
You can try to understand like this. For example, car sharing companies nowadays try to invoke blockchain to their systems. Once you rent a car, your whole information will store persistent and immutable on the car. The next car renter will see information about the previous user that will help him drive secure :) or something else
A blockchain is just a data structure that is composed of blocks.These blocks form a chain. This is a distributed ledger, which means that every "node" or computer in the network has a copy of the ledger.
The blockchain is something which utilizes the functionality of distributed database to commit transactions between peer nodes which are part of the ecosystem. It's not distributed computing it has something more like Encryptions, nodes, ledgers, digital signing, and many more additional things. You can say its skyscraper for what distributed computing do.
If you see in blockchain we have private and public blockchain network, like IBM Hperledger Fabric, Etherium, R3 Corda

cleaning up missed geocoding (or general advise on data cleaning)

I've got a rather large database of location addresses (500k+) from around the world. Though lots of the address are duplicates or near duplicates.
Whenever a new address is entered, I check to see if it is in the database already, and if so, i take the already existing lat/long and apply it to the new entry.
The reason I don't link to a separate table is because the addresses are not used as a group to search on, and their are often enough differences in the address that i want to keep them distinct.
If I have a complete match on the address, I apply that lat/long. If not, I go to city level and apply that, if I can't get a match there, I have a separate process to run.
Now that you have the extensive background, the problem. Occasionally I end up with a lat/long that is far outside of the normal acceptable range of error. However, strangely, it is normally just one or two of these lat/longs that fall outside the range, while the rest of the data exists in the database with the correct city name.
How would you recommend cleaning up the data. I've got the geonames database, so theoretically i have the correct data. What i'm struggling with is what is the routine you would run to get this done.
If someone could point me in the direction of some (low level) data scrubbing direction, that would be great.
This is an old question, but true principles never die, right?
I work in the address verification industry for a company called SmartyStreets. When you have a large list of addresses and need them "cleaned up", polished to official standards, and then will rely on it for any aspect of your operations, you best look into CASS-Certified software (US only; countries vary widely, and many don't offer such a service officially).
The USPS licenses CASS-Certified vendors to "scrub" or "clean up" (meaning: standardize and verify) address data. I would suggest that you look into a service such as SmartyStreets' LiveAddress to verify addresses or process a list all at once. There are other options, but I think this is the most flexible and affordable for you. You can scrub your initial list then use the API to validate new addresses as you receive them.
Update: I see you're using JSON for various things (I love JSON, by the way, it's so easy to use). There aren't many providers of the services you need which offer it, but SmartyStreets does. Further, you'll be able to educate yourself on the topic of address validation by reading some of the resources/articles on that site.

Incorporating shareware restrictions in C++ software

I wish to implement my software on a shareware basis, so that the user is
given a maximum trial period of (say) 30 days with which to try out the software. On purchase I intend the user to be given a randomly-generated key, which when entered
enables the software again.
I've never been down this route before, so any advice or feedback or pointers to 'standard' ways of how this is done would be much appreciated.
I do not anticipate users cheating by changing the system date or anything like that, though this is probably worth considering. Apologies if this topic has appeared before.
With regards to a random-generated key, how will you verify a key is legit or if a key is bogus if it is actually random? Have a look at the article "Implementing a Partial Serial Number Verification System" as it is quite good and is easy to implement in any language.
With regards to time trials, as basic solution would be to compare your main executable files creation time to the current system time and act on the difference. This assumes your installer sets the files creation time to the time of install as opposed to preserving the time you compiled it! :)
Also watch out for the time changing radically, if the current date is magically less than the install date and such.
One way to get around this type of datelock is to change your date before you install to be years in the future. So you should check that the date today is not less that the install date.
If your software is really useful, you'll certainly find cracked copies on P2P before you see your first order. This will happen no matter how sophisticated is the license enforcement code you are going to implement.
That said, just store first-run date somewhere (may be registry, if on Windows) and after 30 days refuse to start, or just open a reminder window.
Don't worry about cheaters, they'll find a way around your restrictions no matter what. Worry about your honest customers and try hard not to make their life harder.
Eric Sink has written more about this here (section 4).
On the first start, you can store the actual date somewhere.
Each following start, you look for the stored date, if it exist you read it an if it is more than 30 days after the first start, you stop the program.
Please see this library.
Description:
Convert any application into time-limited shareware. Generate serial numbers to register it. A function library offering a flexible locking system with solid encryption. Easy to implement. Support for VB, C++, Delphi, other languages.