BlockChain for Storing personal Data - blockchain

I want to store Personel data to BlockChain for a company. We want to prove that the data is unchangeable. A Customer in the blockchain will not access or see any other customer data.
But Company will access all customer data and can make any operation and also can follow any operation, any access Log.
Company will store new form type(Personal data) and flag it as a personal data card.
Is it possible with Blockchain?

The best method would be to encrypt the data, but it really depends upon what you are doing with it. If you need to do operations on it, then you will have to use zk-SNARKs, but these are a new field and you would have to do a lot of research to get it working. If you aren't using the data for anything; it's just metadata, then why would you need it to be on a public ledger and validated?
Plus, there is one big problem about storing sensitive data on the blockchain: the blockchain is immutable and once something is on the blockchain, it is stored forever. So what if there comes a time when quantum computers become so powerful that they can break all encryption we have today? Then all your users' personal data will be public on the blockchain.

Related

How can I retrieve the entire old ethereum blockchain data for data processing

I want to get the "entire" ethereum blockchain data, not just from a few sets of smart contracts. By data I mean, transaction details including the generated logs.
I can get real-time data using Infura, but it's pretty much impossible to fetch all the old data, it would simply cost too much because I would simply have to do too many network requests.
I need the old data because I am trying to make an indexed database out of the "append-only" ethereum transaction data so that I can easily query it.
To be more precise, I would like to retrieve all NFT(ERC721, ERC1155) transfer transactions and their logs. So that I can do the following queries and much more: all the NFT owned by a particular wallet, transfer histories of a particular NFT token.
You can do this by
Run your own node
Query data from your node - locally it is fast
For some data, you might need to run the node in archival mode
You can use the same Web3 / JSON-RPC APIs on a local node than you are using on Infura.
Two solutions I have discovered.
Just like #Mikko has mentioned, you can run your own node. And it seemed not be as complex as I have expected. You can search for "geth" and then simply connect this node to your web3 library, just like connecting to Infura.
But I have not tried this and found a much better solution.
Google cloud Bigquery's public data set has all the old ethereum data. Bigquery is Google's data warehouse service, where you can use simple SQL to query your data. It adds new data every day. I have already tested some simple queries from its console and the result was good.
I am planning to fetch all the old data I need from bigquery and store it in my own database and afterwards get real time data from infura. Now that I dont have to fetch all the old data from infura, the price becomes very affordable.
you may check this https://github.com/blockchain-etl/ethereum-etl
It is a Python library for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, and internal transactions.
For example, you may run the cli command
> ethereumetl export_token_transfers --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --output token_transfers.csv
You may export ERC20 and ERC721 transfers by specific the block number which enable you to query the old data.
Data is also available in Google BigQuery.

Where is the actual data stored in blockchain?

I know that the blockchain stores the transnational data and it is immutable but where is the actual data stored?
If there is a use case to replace a centralized data center solution with a blockchain solution, where will the data be stored in blockchain?
Data centers usually have petabytes of raw data, hence I am assuming that a decentralized solution like blockchain won't be able to accommodate large amounts of data.
Note: Many links on google say that blockchain is not an ideal solution for large data, but then any solution will eventually produce an ever increasing amount of data.
In most blockchains, such as bitcoin, every node contains a full set of all data, allowing all nodes to verify previous and new transactions. The data itself is normally stored in a local database, typically leveldb.
As you assumed, for this reason, distributed databases (blockchains) that require a fully copy of the dataset are not ideal for petabytes worth of data. The Bitcoin blockchain is currently roughly 270GB.

How would the data schema of bitcoin look like?

Since bitcoin is a blockchain and blockchain has been described as a kind of database, how would the data schema of bitcoin look like? Is it a single table database? If yes, which columns are inside this table?
The data is stored in an application-specific format optimized for compact storage, and wasn't really intended to be easily parsed by other applications.
See https://bitcoin.stackexchange.com/q/10814
For this custom format, see https://en.bitcoin.it/wiki/Protocol_documentation#block
There are various databases for various usages. As a reference client I would use bitcoin-core and describe its standard structure that is stored via the client. It actually uses "leveldb" and "berkleydb-4.8" for storing all kind of data.
Wallet database
Saves your transactions, generated public/private keys. That is usually encrypted ;)
Source: Wallets
Index Database
It's usually OPTIONAL, but usually stores a list of all transactions and in which block they occurred
Block Database
It's the most important db which locally stored and share via the network to communicate about newly created blocks and verify them. Every client has a copied version of it.
They usually store all blocks that ever occurred and also include fork-off blocks and also obsolete blocks.
Source: Blockchain / Transactions
Peers Database
Obviously there also is a database for all peers you have seen in the past. It rates each peer by giving it a ban-score, stores their IP addresses, ports and last seen status.
Conclusion:
That would be all databases. They mostly have "one table" which includes exactly the previously described data structures.
More information about the p2p network structure can be found right here.

Store data off-blockchain using a smart contract

I found a paper which is talking about a way to store data off-chain using the blockchain. The data are sent to the blockchain with a transaction which subsequently routes it to an off-blockchain store, while retaining only a pointer to the data on the public ledger.
In particular the paper says:
Consider the following example: a user installs an application that uses our platform for preserving her privacy. As the user signs up for the first time, a new shared (user, service) identity is generated and sent, along with the associated permissions, to the blockchain in a Taccess transaction. Data collected on the phone (e.g., sensor data such as location) is encrypted using a shared encryption key and sent to the blockchain in a Tdata transaction, which subsequently routes it to an off-blockchain key-value store, while retaining only a pointer to the data on the public ledger (the pointer is the SHA-256 hash of the data).
What I cannot understand is how they do it! If all the nodes on the blockchain have to execute that very transaction, it means that they all have to save those information off-blockchain causing a duplication of contents. Did I get it wrong?
After a quick glance at the paper in question, it makes no mention of storage replication. The use case they are describing here is to use blockchain transactions as references to physical data that is stored somewhere. The data can be accessed by anyone who has the reference to it; i.e. access to that particular blockchain system, however the data is encrypted such that only parties with the encryption key can actually decipher it. This approach allows for quick validation of data integrity while maintaining privacy.
From the perspective of the blockchain node all they see is a transaction that will be added to their local ledger, they don't actually save the data themselves.

Data storage on Corda

I am working on a use case. The requirement being I need to create an order. The order has some customer parameters. The order can be modified multiple times. Initially, I thought of implementing it in Ethereum. So, I thought of capturing the customer details from a UI and store it on the smart contract. However, the issue is once the contract is deployed I cannot change it since it is immutable. This drawback prevents me to use Ethereum. Deliberating on Corda, can I store the customer data as a single records and make modifications to it such that modifications are stored on the ledger which we can query. For instance, I want to store customer ID, Customer name, customer purchase order number, order type, and order status.
How would I do that in Corda data model?
Will that data be modifiable based on the rules coded on smart contract?
The data can be queried?
Yes.
In Corda, you store information using States. In your case, you might create a CustomerDataState class. You'd then create instances of this class on the ledger to represent your various customers.
This data can then be updated using transactions. But not all transactions are allowed. Which transactions are valid is based on the rules in the associated CustomerDataContract. The CustomerDataContract will be stateless, and simply exists to say how CustomerDataStates can evolve over time.
You can then easily query this data from your node's vault or transaction storage using an SQL-style syntax.