Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
most of the articles cited Blockchain as a distributed database. Does it mean we can store any type of data in blockchain like audio, video ,pdf?
Think of blockchain as a relatively slow, very expensive database that provides excellent resistance to hacking and corruption. It's a Write-Once, Read Mostly (WORM) system.
You absolutely could store any data you want in a hypothetical blockchain. The practical limits are, you don't want to store very large chunks of data (so, not video); you probably don't want to store frequently changing data (so, not a thesis paper you're revising) -- unless it's important somehow to record every single change forever.
Because, its other feature is, once something is written to a blockchain, it's there forever.
Need to fix a typo? Then you add a new record with a correction.
Need to delete a record? Too bad, you can't. Best you can do is enter a new record saying that the record you wish to delete is "obsolete" or "repudiated" or "no longer valid" or "should be considered as deleted."
In short, it's wise to treat your blockchain as a permanent record.
1 Slow: the Bitcoin blockchain runs about 3 transactions per second (tps) and the Ethereum blockchain runs about 30 tps.
2 Expensive: the Bitcoin blockchain cost an average of US$ 8.22 per transaction in November 2017 according to Digiconomist.
Look at what type of data storage, if it is some string, json object, you can expand the structure of the book structure chain store; If the picture, video, large files; You can hash the value of the block chain, the original file using cloud storage
If you asked the question referring "blockchain is a distributed database" as the statement used while explaining about blockchain in blogs and video tutorials, providing further clarification for your understanding:
1. blockchain is not a distributed database technology if you are comparing it with other rdbms/nosql database.
2. bockchain is some how distributed database if you consider it has distributed nodes in the network and all have consistent copy of ledgers, distributed ledgers which are maintained in any kind of database technology and also leverage cryptography to provide a decentralized multi-version concurrency control and maintain consensus about the existence.
Refer the link for further explanation, where you find explanation about it as a distributed database and other similar stuff.
It's probably better to think of the blockchain as a distributed ledger i.e., ledger data that is shared among a number of actors. The reason the DB analogy doesn't work is addressed by one of the other answers: all changes have to be adds/amendments as the ledger itself is immutable. Any database that can't modify data is hobbled to say the least, however, the blockchain is more about an unchanging historical record than it is about storing data for manipulation.
You can put whatever data you want onto the blockchain but considering how data is added to the blockchain and the fact that ALL CHANGES are recorded the smaller the data the better.
The first version of blockchain applied in bitcoin. The main idea behind blockchain is to be decentralized. It consists of blocks. Each block contains information about previous node and the current node. Whatever information (like audio, video, pdf) has to be hashed(digital signature).
You can try to understand like this. For example, car sharing companies nowadays try to invoke blockchain to their systems. Once you rent a car, your whole information will store persistent and immutable on the car. The next car renter will see information about the previous user that will help him drive secure :) or something else
A blockchain is just a data structure that is composed of blocks.These blocks form a chain. This is a distributed ledger, which means that every "node" or computer in the network has a copy of the ledger.
The blockchain is something which utilizes the functionality of distributed database to commit transactions between peer nodes which are part of the ecosystem. It's not distributed computing it has something more like Encryptions, nodes, ledgers, digital signing, and many more additional things. You can say its skyscraper for what distributed computing do.
If you see in blockchain we have private and public blockchain network, like IBM Hperledger Fabric, Etherium, R3 Corda
Related
Recently, I am doing research on bitcoin since it is fascinating. I came up with a few questions related to it that I would highly appreciate if someone can answer.
I do not understand how transactions can be verified by just using the Merkle Root. The block headers only contain the Merkle root, but to verify if the transactions in the block are valid, you still have to hash all of the transactions and compare it with the Merkle root. Am I missing something?
It seems like bitcoin source code can be updated: https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md. If this is the case, how can we say that the bitcoin is permanent decentralized store of value? We don't know how the system will change in the future. Also, who is in charge of developing the bitcoin? If there is institution in charge of this, how can we say it is fully decentralized?
Thanks!
The transaction identifier a.k.a. txid is generated by twice hashing (SHA-256) the serialized transaction data. The merkle root is the result of a merkle tree which takes as inputs all txids in the block. To paraphrase this masterpiece website (learnmeabitcoin):
"[Merkle root] gives you a short yet unique fingerprint for all the transactions in a block".
Yes, the bitcoin source code can and has been updated for years. In Bitcoin history, there are plenty groups which has modified (forks) bitcoin source code and run another version of the bitcoin repository.
Bitcoin forks are listed, some of them still running. We don't know how the system will change in the future, but if you don't like these changes, you are free to fork your own and aggregate a community around it. The community is crucial, the contributors (miners, developers, writers …) and users give strength to the project, as long as there is a community around bitcoin, it can exist.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I have an application which needs to get intraday stock quotes on several assets (indices, commodities etc').
I want to be able to query the data in HTTP and get it as CSV/XML format.
Now, I'd like to be able to ask the data provider for example what was the last bid/ask/price on GE (General Electric) at 4:00PM, and ask it in let's say 4:05PM on that day, for further processing.
Similar services to what I'm looking for:
Reuter's DataLink service can give me this data on the last trade of the day.
I need it to flow through all day long - intraday.
Yahoo Finance (the query formay within it) is a great service which does what I want in terms of data delivery yet I'm unsure regarding its reliability/timing since it's free.
Also, I couldn't find any information regarding the delay of the data they provide relatively from the real world timing (like many websites give this data in delay of ~20min).
QuoteRSS gives this for free as well, it let's me pick a ticket and get its data, yet once again I'm unsure regarding its reliability, as well as its timing, which I have doubt if this is "realtime" or close to that.
Finally this blog post by google "At long last, real-time stock quotes are here" claims to offer free data on certain stocks, but in Google Finance's pages I can't find anything about it, nor at their API pages, and again, who knows what delay I get from the realtime data.
In addition to the concern with the above mentioned services (Yahoo, QuoteRSS & Google) I'm not sure how/if they provide an intraday information regarding the stocks, something which I need.
Worth mentioning is that many websites which deal with Forex claim to be getting their data feed from Reuters/Bloomberg.
Didn't find such a solution on both's sites. I even went online with a sales rep. at Reuters to ask about it and his answer, after a decent discussion, was that "he's afraid he cannot offer me anything better than their service DataLink". How odd!!
So to summarize my question;
1) Where do I get such data feed, in which I select several tickets from several markets, and get a closer-than-20min information regarding these tickets, in concise format (CSV/XML)?
2) If Reuters/Bloomberg offer it (I'll probably also call them later) - where is it being offered, at their websites? I'd like to get the data from a "big name" such as these guys, for reliability reasons.
3) Regarding "realtime" or not, it depends on the cost. What costs should I prepare to? I'm assuming that realtime feed costs a LOT, so, is there an option between realtime and the 20min delayed feed? Something like 2-5min delay?
4) Please mention how, or if, I can query for stocks' data in a timely manner, like "what was the price of GOOG at 4:00PM?".
Note #1:
Please keep in mind, when answering, that I need the quotes intraday and not "by the end of the day".
Note #2:
If google/yahoo do actually offer this kind of service for free, how do I find it? Directly. I don't mind starting with these "freewares" for testing and such, especially if I can query for data in a timely manner as mentioned above ("what was the price of GOOG at 4:00PM?").
Note #3:
In terms of licensing, I do not intend to resell this information. Simply as that.
Before they closed shop, I used opentick. My blog post about opentick shutting down got quite a bit of traffic, so I decided to write another post that examined some potential opentick alternatives. Take a look at the companies in the post and comments. Hopefully one of them will work for you.
I have used IQFeed for some time. It is not HTTP or a CSV but it is a streaming push of ticks from their servers to you. The client is a bit kludgy but overall I find it to be acceptable for the price. This type of feed would be considered "realtime" by most people and since you are talking about minutes I assume that you are someone who is not worried about a couple seconds of latency here or there.
I have experience with Reuters (Thomson) feeds. They are expensive since we are now talking about TotalView/OpenBook data. This would be used to calculate the history of the order book and could be used for analyzing things like the liquidity of an equity at different price levels. I had a good experience with them at another job. 24/7 Engineering support, fixes, decent security db. The reality is that there is a wide variety of ways to get these feeds mostly from brokerages. I don't think this is what you are looking for since you mentioned things that were free.
There are "mid tier" providers like CQG although I have no experience with them.
In general no matter who you are using you need to be willing to implement their protocol and format. I have found this to be true no matter which feed I use. The good news is that all you need to do is make a parser.
What was the price of Google at 4:00PM? Who can say. Which part of 4PM? Would the price at 4PM would be something like the final print to the tape of the closing auction? Is it the auction midpoint? The price is what you can transact at which can be very different then what you see printed. ;-P
A final note: If you are building a trading system of some sort pay for your data. It should be cleaner than trying to assemble it. The exchanges charge for data and there is no real way around it. If you can't afford a couple of hundred bucks a month for some data then you probably don't have enough capital to be trading.
Concerning Bloomberg, I just called them & they said that they only provide market data for personal use. So you cannot show it on your site, but you can do whatever you want with it as long as you don't publish it.
I'm looking for some architecture ideas on a problem at work that I may have to solve.
the problem.
1) our enterprise LDAP has become a "contact master" filled with years of stale data and unused and unmaintained attributes.
2) management has decided that LDAP will no longer serve as a company phone book. it is for authorization purposes only.
3) the company has contact type data about people in hundreds of different sources. we need to scrub all the junk out of LDAP and give the other applications a central repo to store all this data about a person.
the ideal goal
1) have a single source to store all the various attributes about a person
2) the company probably has info on 500k people ( read 500K rows)
3) i estimate there could be 500 to 1000 optional attributes on these people. (read 500+ columns)
4) data would primarily be set/get via xml over jms (this infrastructure is already in place)
5) individual groups within the company could "own" columns. only they would be allowed to write to their columns, they would be responsible for keeping the data clean.
6) a single record lookup should be returned in sub seconds
7) system should support 1 million requests per hour at peak.
8) the primary goal is to serve real time data to the enterprise, reporting is a secondary goal.
9) we are a java, oracle, terradata shop. we are your typical big IT shop.
my thoughts:
1) originally i thought LDAP might work, but it doesn't scale when new columns are added.
2) my next thought was some kind of no-sql solution, but from what i have read, I don't think i cant get the performance I need, and its still relatively new. I'm not sure i can get my manager to sign off on something like that for such a critical project.
3) i think there will be a meta-data component to the solution that will track who owns the columns and what each column represents, and the original source system.
Thanks for reading, and thanks in advance for any thoughts.
SQL
With Teradata-grade tools an SQL-based solution may be feasible. I came across an article on database design awhile ago that discussed "anchor modeling".
Basically, the idea is to create a single, dumb, synthetic primary key table, while all real or meta data lives in other tables (subsets) and is attached by way of a foreign key + join.
I see the benefit of this design being two-fold. First, you can more easily compartmentalize data storage either for organizational or performance reasons. Second, you only create additional rows for records that have data in any given subset, so you use less space and indexing and searching are faster.
Subsets might be based on maintainer or some other criteria. XML set/get would be per-subset/record (rather than global record). All subsets for a given records can be composited and cached. Additional subsets can be created for metadata, search indexes, etc., and these can be queried independently.
NoSQL
NoSQL seems similar to LDAP (in theory, at least) but the benefit of a good NoSQL tool would include greater abstraction of metadata, versioning, and organization. In fact, from what I've read it seems that NoSQL datastores are designed to address some of the issues you've raised with respect to scaling and loosely structured data. There's a good question on SO regarding datastores.
Production NoSQL
Off-hand, there are a handful of large companies using NoSQL in massively-scaled environments, such as Google's Bigtable. It seems like the perfect tool for:
6) a single record lookup should be returned in sub seconds
7) system should support 1 million requests per hour at peak.
Bigtable is only available (to my knowledge) through AppEngine. Other, similar technologies are listed here.
Other Thoughts
The bigger picture view looks more or less the same regardless of the technology you decide to use. E.g. compartmentalize storage, composite views, cache views, stick metadata somewhere so you can find things.
The performance characteristics you're targeting are going to require some kind of caching and/or optimization based on real-world usage patterns. Regardless of the solution you choose, you probably can't resolve that in the design phase.
A couple thoughts:
1) our enterprise LDAP has become a "contact master" filled with years of stale data and unused and unmaintained attributes.
This isn't really a technological problem. You will have this problem with a new system as well, LDAP or not.
"LDAP ... doesn't scale"
There are lots of huge LDAP systems out there. LDAP is surely a dark art, but I'd willing to bet that it scales better than any SQL equivalent in this situation. Not to mention that LDAP is a standard for this kind of info, and as such it is accessible from zillions of different kinds of systems.
Maybe what you're looking for is a new LDAP system that's easier to manage / has better admin tools?
You may want to look into Len Silverston's Party Model. Here's a link to his book: http://www.amazon.com/Data-Model-Resource-Book-Vol/dp/0471380237.
I have no experience building something on that scale, though I think that thinking of it as 500k rows x 500 - 1000 columns sounds a bit ridiculous.
I can find the technical explanation of what data mining is in a book or on Wikipedia, but I'm wondering what sort of development does it exactly involve? Is it more about using tools or more about writing tools? Is it really any much different from other domains when it comes to R&D?
Data Mining is the process of discovering interesting patterns in large amounts of data. It is not querying data, which is just what user Treb describes (sorry Treb).
To understand DM from a developer's perspective, you should read the book Programming Collective Intelligence by Toby Segaran.
In my experience (I'm a former data miner :-)), it's a mixture of using tools and writing tools. A lot of the time, the tools you need to analyse the particular data set don't exist, so you have to write them yourself first. It can be very interesting but you often need quite a different approach to the sort of programming I do now (embedded wireless), for example.
You really ought to change the accepted answer on this question so it doesn't mislead those who come across it.
Saying that querying a database IS data mining because "[h]ow would you discover any pattern in your data without querying first?" is like saying opening your car door is driving because "how else would you be able to drive somewhere without opening the car door first."
You can read your data out of a text file if you want. My first data mining assignment used data sets from the UCI repository and those are almost all text files.
If you want to learn about data mining start by looking up clustering and classification. Learn about decision trees and rule based classification. Then look at k-nearest-neighbor and k-means. After that if you really want to see what data mining is all about look at Chameleon, DBScan, and Support Vector Machines. Don't necessarily learn the minutiae of the last three (they're pretty complex and math heavy) but understanding the abstract idea of what happens will tell you all you need to know in order to use the many tools and libraries that are available for each strategy.
These are only the algorithms that popped into my head just now. There are so many others that I don't recall or don't even know yet.
Data mining is about searching large quantities of data for hidden patterns. Web 2.0 example: News corp uses its site myspace.com as a large data mine to determine what movies and products to promote. They write software to identify trends in the data that it's users post to the site. News corp does this to gather information useful for advertising campaigns and market predictions. It's different from other domains of R&D in that from a data givers perspective its passive. Rather than going out on the street and asking people in person what movies they are likely to see this summer and other such questions, the data mining tools sort out these things by analyzing data given by users voluntarily.
Wikipedia actually does have a pretty good article on it:
- http://en.wikipedia.org/wiki/Data_mining
Data Mining as I say is finding patterns or trends from given data. A developer perspective might be in applications like Anti Money Laundring... Where given a pattern you will search data for that given pattern. One other use is in Projection Softwares... where you project a result or outcome in future against a heuristic by studying recognizing the current trend from data.
I think it's more about using off the shelf tools rather than developing your own. An academic example of that kind of tools might be WEKA. Of course, you still have to know what algorithms use, how to preprocess data (very important this part), etc.
In R&D I don't have much idea, but it should be like almost everything: maths, statistics, more maths...
On the development level, data mining is just another database application, but with a huge amount of data.
The mining itself is done by running specific queries on the database. It's in the creation of the queries where the important work is done. They of course depend on the data model, and on the hypotheses, what sort of trends the customer expects to find.
Therefore, the fine tuning of the queries usually can't be done in development, but only once the system is live and you have live data. Then the user can test his hypotheses and adapt the queries to show him the trends he is looking for.
So from a dev point of view, data maining is about
Managing large sets of data in your client (one query may return 100.000 rows of data)
Providing the user (who may know nothing about SQL or relational databases in general) with an effective way to modify his queries and view the results.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
Is there any library (or even better, web service) available which can convert from a latitude/longitude into a time zone?
I looked fairly deeply into this question for a project I am working on. GeoNames.org and EarthTools.com are both good options for many situations but with the following serious flaws:
GeoNames.org finds the time zone by searching for the nearest point in their database that contains a time zone field. This often leads to the wrong result near borders. It is also painfully slow, leading to query times on the order of a couple seconds per request. It also doesn't return a valid time zone if there is no item in their database near the query point. GeoNames also restricts the number of queries that can be made per day, making bulk operations difficult.
EarthTools.org uses a map and is able to return queries quickly, but it doesn't take into account daylight savings time for most locations, and it returns a raw offset rather than a time zone ID (i.e., they return "GMT-7" instead of "America/Chicago"). Also, I just looked at their page while preparing this post and Google Chrome warned about malware on their site. That is new to me and it may change, but is obviously a cause for concern.
These flaws meant that these existing tools were not suitable for my needs so I rolled my own solution and have published it for general use. You can find it here:
http://www.askgeo.com/
AskGeo is based on a time zone map of the world, so it returns a valid time zone for every valid latitude and longitude. It returns the standard time zone ID (e.g., "America/Los_Angeles") used on Linux and most other operating systems and programming frameworks. It also returns the current offset, taking full account of daylight savings time.
It is extremely easy to use and usage is documented on the main page of the site. The API supports batch queries, so if you need to do a lot of look-ups, please use the batch interface rather than bog down our servers with serial requests. The bulk queries are also much faster, so everybody wins.
When we first launched this, we built it on Google App Engine (GAE) and made it free to all users. This was possible because GAE's prices were so low at that time. Since then, our server load has increased substantially and GAE's prices went way up. Both factors combined led us to switch to Amazon Web Services for hosting and to start charging for commercial use, while keeping the service free for non-profit, non-commercial open source projects, and researchers. For commercial users, we provide 1000 free queries to let potential customers evaluate the API to make sure it meets their needs. See the web site for pricing and terms.
The underlying library was written in Java and due to popular demand, we also released the library under a commercial license. Full documentation of the library and pricing details are on the web site.
I hope this is useful. It certainly was useful for the project I was working on.
Take a look at Geonames.org
It's a free webservice that allow you to get a lot of informations from a long/lat
They also provide a free (and open source) Java Client for GeoNames Webservices library (library for other language also provided: ruby, python, perl, lisp...)
Here's some info you can get from long/lat: (complete list of webservices here)
Find nearest Address
Find nearest Intersection
Find nearby Streets
Elevation
Timezone
Timezones are now available via Google API
https://developers.google.com/maps/documentation/timezone/
The Yahoo places API provides timezone information via reverse geolocation.
Check it out.
http://developer.yahoo.com/geo/placefinder/guide/requests.html
Eric Muller has made shapefile maps for the timezones of the tz (Olson) database. A few minor caveats, though:
The boundaries used are often unofficial.
It isn't updated as regularly as the tz database itself, so some newly-formed or -adjusted zones may be missing.
Those aside, however, it seems to be very accurate for most purposes.
How much accuracy do you need? Dividing the longitude by 15 would almost be right :p
Project dead :-/
These look pretty promising-
Archive link:
https://web.archive.org/web/20150503145203/http://www.earthtools.org/webservices.htm
DRT Engine takes a latitude, longitude and local datetime and returns a timezone offset. This can be used to establish the timezone of a particular location at a future date.