How would the data schema of bitcoin look like? - blockchain

Since bitcoin is a blockchain and blockchain has been described as a kind of database, how would the data schema of bitcoin look like? Is it a single table database? If yes, which columns are inside this table?

The data is stored in an application-specific format optimized for compact storage, and wasn't really intended to be easily parsed by other applications.
See https://bitcoin.stackexchange.com/q/10814
For this custom format, see https://en.bitcoin.it/wiki/Protocol_documentation#block

There are various databases for various usages. As a reference client I would use bitcoin-core and describe its standard structure that is stored via the client. It actually uses "leveldb" and "berkleydb-4.8" for storing all kind of data.
Wallet database
Saves your transactions, generated public/private keys. That is usually encrypted ;)
Source: Wallets
Index Database
It's usually OPTIONAL, but usually stores a list of all transactions and in which block they occurred
Block Database
It's the most important db which locally stored and share via the network to communicate about newly created blocks and verify them. Every client has a copied version of it.
They usually store all blocks that ever occurred and also include fork-off blocks and also obsolete blocks.
Source: Blockchain / Transactions
Peers Database
Obviously there also is a database for all peers you have seen in the past. It rates each peer by giving it a ban-score, stores their IP addresses, ports and last seen status.
Conclusion:
That would be all databases. They mostly have "one table" which includes exactly the previously described data structures.
More information about the p2p network structure can be found right here.

Related

How can I retrieve the entire old ethereum blockchain data for data processing

I want to get the "entire" ethereum blockchain data, not just from a few sets of smart contracts. By data I mean, transaction details including the generated logs.
I can get real-time data using Infura, but it's pretty much impossible to fetch all the old data, it would simply cost too much because I would simply have to do too many network requests.
I need the old data because I am trying to make an indexed database out of the "append-only" ethereum transaction data so that I can easily query it.
To be more precise, I would like to retrieve all NFT(ERC721, ERC1155) transfer transactions and their logs. So that I can do the following queries and much more: all the NFT owned by a particular wallet, transfer histories of a particular NFT token.
You can do this by
Run your own node
Query data from your node - locally it is fast
For some data, you might need to run the node in archival mode
You can use the same Web3 / JSON-RPC APIs on a local node than you are using on Infura.
Two solutions I have discovered.
Just like #Mikko has mentioned, you can run your own node. And it seemed not be as complex as I have expected. You can search for "geth" and then simply connect this node to your web3 library, just like connecting to Infura.
But I have not tried this and found a much better solution.
Google cloud Bigquery's public data set has all the old ethereum data. Bigquery is Google's data warehouse service, where you can use simple SQL to query your data. It adds new data every day. I have already tested some simple queries from its console and the result was good.
I am planning to fetch all the old data I need from bigquery and store it in my own database and afterwards get real time data from infura. Now that I dont have to fetch all the old data from infura, the price becomes very affordable.
you may check this https://github.com/blockchain-etl/ethereum-etl
It is a Python library for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, and internal transactions.
For example, you may run the cli command
> ethereumetl export_token_transfers --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --output token_transfers.csv
You may export ERC20 and ERC721 transfers by specific the block number which enable you to query the old data.
Data is also available in Google BigQuery.

ODBC Equivalent of DBMS_ALERT in Oracle

Is there anything (system procedure,function or other) in SQL Server that will provide the functionality of DBMS_ALERT package of ORACLE (and DBMS_PIPE respectively)?
I work in a plant and I'm using an extension-product of SQL-Server called InSQL Server by Wonderware which is specialized in gothering data from plant controllers and HumanMachineInterface(SCADA) software.
This system can record events happening in the plant (like a high-temperature alarm, for example). It stores sensor values in extension tables of SQL Sever, and other less dense information in normal SQL Server tables.
I want to be able to alert some applications running on operator PCs that an event has been recorded in the database.
An after insert trigger in the events table seems to be a good place to put something equivalent to DBMS_ALERT (if it exists), to wake up other applications that are waiting for the specific alert and have the operators type in some data.
In other words - I want to be able to notify other processes (that have connection to SQL Server) that something has happened in the database.
All Wonderware (InSQL but now called Aveva) Historian data is stored in the history blocks EXCEPT for the actual tag storage configuration and dedicated event data. The time series data for analog, discrete and strings is NOT in SQL tables at all - unless someone is doing custom configuration to. create tables of their own.
Where are you wanting these notifications to come up? Even though the historical data is NOT stored in SQL tables, Wonderware has extensive documentation on how to use SQL queries to appropriately retrieve data (check for whatever condition you are looking for)
You can easily build a stored procedure and configure it for a maintenance plan.
But are you just trying to alarm (provide notification) on the scada itself?
Or are you truly utilizing historical data (looking for a data trend - average, etc.)?
Or trying to send the notification to non-scada interfaces?
Depending on your specific answer, the scada itself should probably be able to do it.
But there is software that already does this type of thing Win-911, SeQent, Scadatec are a couple in the OT space. But also things like Hip Link or even DeskAlert which can connect to any SQL via it's own API.
So where does the info need to go (email, text, phone, desktop app...) and what is the real source of the data>

Where is the actual data stored in blockchain?

I know that the blockchain stores the transnational data and it is immutable but where is the actual data stored?
If there is a use case to replace a centralized data center solution with a blockchain solution, where will the data be stored in blockchain?
Data centers usually have petabytes of raw data, hence I am assuming that a decentralized solution like blockchain won't be able to accommodate large amounts of data.
Note: Many links on google say that blockchain is not an ideal solution for large data, but then any solution will eventually produce an ever increasing amount of data.
In most blockchains, such as bitcoin, every node contains a full set of all data, allowing all nodes to verify previous and new transactions. The data itself is normally stored in a local database, typically leveldb.
As you assumed, for this reason, distributed databases (blockchains) that require a fully copy of the dataset are not ideal for petabytes worth of data. The Bitcoin blockchain is currently roughly 270GB.

Store data off-blockchain using a smart contract

I found a paper which is talking about a way to store data off-chain using the blockchain. The data are sent to the blockchain with a transaction which subsequently routes it to an off-blockchain store, while retaining only a pointer to the data on the public ledger.
In particular the paper says:
Consider the following example: a user installs an application that uses our platform for preserving her privacy. As the user signs up for the first time, a new shared (user, service) identity is generated and sent, along with the associated permissions, to the blockchain in a Taccess transaction. Data collected on the phone (e.g., sensor data such as location) is encrypted using a shared encryption key and sent to the blockchain in a Tdata transaction, which subsequently routes it to an off-blockchain key-value store, while retaining only a pointer to the data on the public ledger (the pointer is the SHA-256 hash of the data).
What I cannot understand is how they do it! If all the nodes on the blockchain have to execute that very transaction, it means that they all have to save those information off-blockchain causing a duplication of contents. Did I get it wrong?
After a quick glance at the paper in question, it makes no mention of storage replication. The use case they are describing here is to use blockchain transactions as references to physical data that is stored somewhere. The data can be accessed by anyone who has the reference to it; i.e. access to that particular blockchain system, however the data is encrypted such that only parties with the encryption key can actually decipher it. This approach allows for quick validation of data integrity while maintaining privacy.
From the perspective of the blockchain node all they see is a transaction that will be added to their local ledger, they don't actually save the data themselves.

SOA/Web Service Pagination

In SOA we should not be building or holding state (or designing dependencies) between client and server. This is understood. But what patterns can be followed in the case that a client wants to consume a real-time service that may return an open ended number of 'rows'?
Web applications, similar to SOA but allowing for state (sessions) have solved this with pagination. Pagination requires (in most cases, especially with SQL) that the server holds the data and that the client request the data in chunks.
If we where to consider pagination-like scenarios for web services, what patterns would these follow that would still allow the tenets of SOA to be adhered (or as close as possible).
Some rules for the thinkers:
1) Backed by a SQL database (therefore there is no concept of a row number in a select set)
2) It is important to not skip a row or duplicate a row in a set during pagination
3) Data may be inserted and deleted at any time into the database by other clients
4) There is no need to consider the dataset a live (update-able) dataset
Personally, I think that 1 and 2 above already spell our the solution by constraining the solution space with the requirements.
My proposed solution would have the data (as much as is selected) be stored in a read-only store/cache where it can be assigned a row number within the result set and allow pagination to occur on this data snapshot. I have would have infrastructure to store snapshots (servers, external caches, memcached or ehcache - this must scale quite large). The result of such a query would be a snapshot ID and clients could retrieve the data from the snapshot using a snapshot API (web services) and the snapshot ID. Results would be processed in a read-only, forward only manner for x records at a time where x was something reasonable.
Competing thoughts and ideas, criticisms or accolades would be greatly appreciated.
Paginated results in a Web Service is actually quite easy to achieve.
All you have to do is add two parameters to the web service call: Page Size, Page Number.
Page Size is the number of results to include in a page. Page Number is the number of the page of results you are looking for.
Your web service then goes back to the database (or cache), retreives the results, figures out which results fit on the requested page, and return only those results.
The client then has to make a single request per page of results they want from the service.
What you propose with memcached will also work with a caching table. The first service call would (1) INSERT results INTO the caching table with a snapshot ID (2) return the first page from the caching table and the snapshot ID. Subsequent calls would return pages based on page size and page number by querying the caching table using the snapshot ID.
I should think this could also be optimized by using an in-memory caching table, but that depends on whether your database supports INSERT-INTO from a disk table to an in-memory table. That might get complicated in a clustered environment though.
Such a cache is stateful by its very nature if you are retaining a client-specific copy between requests, whether storage is in a session object, database table or memcached data store. Given the requirements though, you have no choice but to cache results in some form or another, except you risk the chance of returning deleted or no-longer-relevant records as legitimate results.
SOA is not meant for such low level functionality.
SOA is meant to glue together business areas, not frontends to backends. Not because your application talks to the back end using webservices you have a "SOA" application. This is non sense since SOA is meaningless in the context of 1 isolated system.
From that point of view, it is then clear that, in SOA, the caller should not have known about the SQL table you are paginating, that’s an implementation detail that SOA should hide. In the other hand the server should not know about the client's state, because it should be agnostic to the details of the clients, to be really open.
So, just understand that pagination is not SOA. Do as you wish, just understand that the webservice you are using to paginate is an internal artifact of your application, not to be used for external clients in a SOA bus. Also remember that it can not be transaction consistent with out state in the server. Probably the problem is that you have only one service layer for the application's UI and the SOA bus, you need to separate them.
Using this webservice in a SOA bus would be bad. I can not be consistent as the user paginates and as other applications hang to it they become tied to the specific SQL.
... then you might as well have granted direct SQL access to the table for all that matters.
SOA is for business messages between systems, not to glue an application's frontend to the backend.
Same problem, resolved using the Navision approach.
$ws->getList($first_record_id, $limit)
This return a page of $limit element that start from the the passed id
select * from collection where collection.id > $first_record_id ASC limit $limit
ordered by id ASC
Navision use Key (each element has a key) but in MySQL an autoincrement id is better.
In this case pagination is intended for handle large result sets and not for a frontend pagination...
I am not sure if SOA is of concern here. The problem you have seems to be with paginating your API's. I will point you to how twitter handles their pagination dev.twitter.com/rest/public/timelines