How did these big companies start from these three main points? - web-services

So this is something that Iv'e been thinking about lately, and it basically is : How did big music web apps or websites like Spotify, Youtube, or Anghami(if you know that one) start? I was actually thinking about 3 things, the first : How did they get these huge music libraries? the second : Did each of those big companies need to buy a special server to hold the website data and music Library? and if yes, how much does a special server cost in this case? and the third question is : How did they solve the copyrights with all of these creators or authors or publishers or whatever they're called, the copyrights owners in this case...?

1. They are uploaded by the artists/creators. I'd imagine pre-release Spotify would have had a library already put together by working with the artists.
2. Yes. They cost a lot. There are hundreds of millions of users and terabytes upon terabytes of data, spread around the world. Server costs will be in the millions. Starting out the upfront cost to set up infrastructure would be very high too.
3. This is definitely not the place to ask this kind of question. I would Google information on how copyrights with artists usually work

Related

OpenEdge 11.3 Application Migration

We have an application with 10 millions lines of code in 4GL(Progress) and a database also OpenEdge with 300 Tables. My Boss says we should migrate it to a new Programming language and a new Database Management system.
My questions are:
Do you think we should migrate it? Do you think Progress has a "future"?
If we should migrate it, how, are there any tools? Or should we begin with programming from scratch?
Thank you for the help.
Ablo
Unless your boss has access to an unlimited budget, endless user patience and a thirst for frustration and agony you should not waste any time thinking about rewrites.
http://www.joelonsoftware.com/articles/fog0000000069.html
Yes, Progress has a future. They probably will never be as sexy an option as Microsoft or Oracle or whatever the cool kids are using this week. But they have been around for 30 years and they will still be here when you and your boss retire.
There are those who will rain down scorn on Progress because it isn't X or it doesn't have Y. Maybe they can rewrite your 10 million lines of code next weekend and prove just how right they are. I would not, however, pay them for those efforts until after the user acceptance tests are passed and the implementation is completed.
A couple of years later (the original post being from 2014 and the answers being from 2014 to 2015) :
The post, which has gotten the most votes is argumenting basically two fold :
a. Progress (Openedge) has been around for a long time and is not going anywhere soon
b. Unless your boss has access to an unlimited budget, endless user patience and a thirst for frustration and agony you should not waste any time thinking about rewrites: http://www.joelonsoftware.com/articles/fog0000000069.html
With regard to a:
Yes, the Progress OpenEdge Stack is still around. But from my experience the difficulty to find experienced and skilled Openedge has gotten even more difficult.
But also an important factor here, which i think has evolved to much greater importance, since this discussion started:
The available Open Source Stacks for application development have gotten by factors better, both in terms of out-of-box functionality and quality and have decisively moved in direction of RAD.
I am thinking for instance of Spring Boot, but not only, see https://stackshare.io/spring-boot/alternatives. In the Java realm Spring Boot is certainly unique. Also for the development of rich Webui's many very valid options have emerged, which certainly are addressing RAD requirements, just some "arbitrary" examples https://vaadin.com for Java, but also https://www.polymer-project.org for Javascript, which are interestingly converging both with https://vaadin.com/flow.
Many of the available stacks are still evolving strongly, but all have making life easier for the developer as strong driver. Also in terms of architectures you will find a convergence of many of this stacks with regard basic building blocks and principles: Separation of Interfaces from Implementation, REST API's for remote communication, Object Relational Mapping Technologies, NoSql / Json approaches etc etc.
So yes the Open Source Stack are getting very efficient in terms of Development. And what must also be mentioned, that the scope of these stacks do not stop with development: Deployment, Operational Aspects and naturally also Testing are a strong ,which in the end also make the developers life easier.
Generally one can say the a well choosen Mix and Match of Open Source Stacks have a very strong value proposition, also on the background of RAD requirements, which a proprietary Stack, will have in the long run difficulty to match - at least from my point of view.
With regard to b:
Interestingly enough i was just recently with a customer, who is looking to do exactly this: rewrite their application. The irony: they are migrating from Progress to Progress OpenEdge, with several additional Open Edge compliant Tools. The reason two fold: Their code is getting very difficult to maintain and would refactoring in order to address requirements coming from Web Frontends. Also interesting, they are not finding enough qualified developers.
Basically: Code is sound and lives , when it can be refactored and when it can evolve with new requirements. Unfortunately there many examples - at least from my experience - to contrary.
Additionally End-of-Lifecyle of Software can force a company, to "rewrite" at least layers of their software. And this doesn't necessarily have to bad and impossible. I worked on a Project, which migrated over 300 Oracle Forms forms to a Java based UI within less then two years. This migration from a 2 tier to a 3 tier architecture actually positioned the company to evolve their architecture to address the needs of Web Ui's. So actually in the end this "rewrite" and a strong return of value also from the business perspective.
So to cut a (very;-)) long story short:
One way or another, it is easy to go wrong with generalizations.
You need not begin programming from scratch. There is help available online and yes, you can contact Progress Technical Support if you find difficulties. Generally, ABL code from previous version should work with only little changes. Here are few things that you need to do in order to migrate your application:
Backup databases
Backup source code and .r files
Truncate DB bi files
Convert your databases
Recompile ABL code and test
http://knowledgebase.progress.com articles will help you in this. If you are migrating from some older versions like 9, you can find a good set of new features. You can try them but only after you are done with your conversion.
If you are migrating from 32-bit to 64-bit and if you are using 32-bit libraries, you need to replace them with 64-bit
The first question I'd come back with is 'why'? If the application is not measuring up that's one thing, and the question needs to be looked at from that perspective.
If the perception is that Progress is somehow a "lesser" application development and operating environment, and the desire is only to move to a different development and operating environment - you'll end up with a lot of resources in time, effort, and money invested - not to mention the opportunity cost - and for what? To run on a different database platform? Will migrating result in a lower TCO? Faster development turn-around time? Quicker time to market? What's expected advantage in moving from Progress, and how long will it take to recover the migration cost - if ever?
Somewhere out there is a company who had similar thoughts and tried to move off of Progress and the ABL. The effort failed to meet their target performance and functionality metrics, so they eventually gave up on the migration, threw in the towel, and stayed with Progress - after spending $25M on the project.
Can your company afford that kind of risk / reward ratio?
Progress (Openedge) has been around for a long time and is not going anywhere soon. And rewriting 10 Million lines of code in any language just to use the current flavor of the month would never be worth it unless your current application is not doing what you need. Even then bringing it up to current needs would normally be a better solution.
If you need to migrate your current application to the latest version of Openedge (Progress) you would normally just make a copy of your database(s) and convert it/them to the new version of Openedge and compile your your code against the new databases and shake the bugs out. You may have some keyword issues, but this is usually pretty minor.
If you need help with programming I would suggest contacting Progress Software and attending the yearly trade show or going to https://community.progress.com/ and asking/looking for local user groups. The local user groups would be a stellar place to find local programming talent.
Hope this helps.....

Issue regarding practical approach on machine learning/computer vision fields

I am really passionate about the machine learning,data mining and computer vision fields and I was thinking at taking things a little bit further.
I was thinking at buying a LEGO Mindstorms NXT 2.0 robot for trying to experiment machine learning/computer vision and robotics algorithms in order to try to understand better several existing concepts.
Would you encourage me into doing so? Do you recommend any other alternative for a practical approach in understanding these fields which is acceptably expensive like(nearly 200 - 250 pounds) ? Are there any mini robots which I can buy and experiment stuff with?
If your interests are machine learning, data mining and computer vision then I'd say a Lego mindstorms is not the best option for you. Not unless you are also interested in robotics/electronics.
Do do interesting machine learning you only need a computer and a problem to solve. Think ai-contest or mlcomp or similar.
Do do interesting data mining you need a computer, a lot of data and a question to answer. If you have an internet connection the amount of data you can get at is only limited by your bandwidth. Think netflix prize, try your hand at collecting and interpreting data from wherever. If you are learning, this is a nice place to start.
As for computer vision: All you need is a computer and images. Depending on the type of problem you find interesting you could do some processing of random webcam images, take all you holiday photo's and try to detect where all your travel companions are in them. If you have a webcam your options are endless.
Lego mindstorms allows you to combine machine learning and computer vision. I'm not sure where the datamining would come in, and you will spend (waste?) time on the robotics/electronics side of things, which you don't list as one of your passions.
Well, I would take a look at the irobot create... well within your budget, and very robust.
Depending on your age, you may not want to be seen with a "lego robot" if you are out of college :-)
Anyway, I buy the creates in batches for my lab. You can link to them with a hard cable(cheap) or put a blue tooth interface on it.
But a webcam on that puppy, hook it up to a multicore machine and you have an awesome working robot for the things you want to explore.
Also, the old roombas had a ttl level serial port (if that did not make sense to you , then skip it). I don't know about the new ones. So, it was possible to control any roomba vacuum from a laptop.
The Number One rule, and I cannot emphasize this enough: Have a reliable platform for experimentation. If you hand build something, just for basic functionality, you will spend all your time on minor issues and not get to the fun stuff.
Anyway. best of luck.

How exactly does sharkscope or PTR data mine all those hands?

I'm very curious to know how this process works. These sites (http://www.sharkscope.com and http://www.pokertableratings.com) data mine thousands of hands per day from secure poker networks, such as PokerStars and Full Tilt.
Do they have a farm of servers running applications that open hundreds of tables (windows) and then somehow spider/datamine the hands that are being played?
How does this work, programming wise?
There are a few options. I've been researching it since I wanted to implement some of this functionality in a web app I'm working on. I'll use PokerStars for example, since they have, by far, the best security of any online poker site.
First, realize that there is no way for a developer to rip real time information from the PokerStars application itself. You can't access the API. You can, though, do the following:
Screen Scraping/OCR
PokerStars does its best to sabotage screen/text scraping of their application (by doing simple things like pixel level color fluctuations) but with enough motivation you can easily get around this. Google AutoHotkey combined with ImageSearch.
API Access and XML Feeds
PokerStars doesn't offer public access to its API. But it does offer an XML feed to developers who are pre-approved. This XML feed offers:
PokerStars Site Summary - shows player, table, and tournament counts
PokerStars Current Tournament data - files with information about upcoming and active tournaments. The data is provided in two files:
PokerStars Static Tournament Data - provides tournament information that does not change frequently, and
PokerStars Dynamic Tournament Data - provides frequently changing tournament information
PokerStars Tournament Results - provides information about completed tournaments. The data is provided in two files:
PokerStars Tournament Results – provides basic information about completed tournaments, and
PokerStars Tournament Expanded Results – provides expanded information about completed tournaments.
PokerStars Tournament Leaders Board - provides information about top PokerStars players ranked using PokerStars Tournament Ranking System
PokerStars Tournament Leaders Board BOP - provides information about top PokerStars players ranked using PokerStars Battle Of Planets Ranking System
Team PokerStars – provides information about Team PokerStars players and their online activity
It's highly unlikely that these sites have access to the XML feed (or an improved one which would provide all the functionality they need) since PokerStars isn't exactly on good terms with most of these sites.
This leaves two options. Scraping the network connection for said data, which I think is borderline impossible (I don't have experience with this so I'm not sure; I've heard it's highly encrypted and not easy to tinker with, but I'm not sure) and, mentioned above, screen scraping/OCR.
Option #2 is easy enough to implement and, with some work, can avoid detection. From what I've been able to gather, this is the only way they could be doing such massive data mining of PokerStars (I haven't looked into other sites but I've heard security on anything besides PokerStars/Full Tilt is quite horrendous).
[edit]
Reread your question and realized I didn't unambiguously answer it.
Yes, they likely have a massive amount of servers running watching all currently running tables, tournaments, etc. Realize that there is a decent amount of money in what they're doing.
This, for instance, could be how they do it (speculation):
Said bot applications watch the tables and data mine all information that gets "posted" to the chat log. They do this by already having a table of images that correspond to, for example, all letters of the alphabet (since PokerStars doesn't post their text as... text. All text in their software is actually an image). So, the bot then rips an image of the chat log, matches it against the store, converts the data to a format they can work with, and throws it in a database. Done.
[edit]
No, the data isn't sold to them by the poker sites themselves. This would be a PR nightmare if it ever got out, which it would. And it wouldn't account for the functionality of these sites, which appears to be instantaneous. OPR, Sharkscope, etc. There are, without a doubt, applications running that are ripping the data real time from the poker software, likely using the methods I listed.
maybe I can help.
I play poker, run a HUD, look at the stats and am a software developer.
I've seen a few posts on this suggesting it's done by OCR software grabbing the screen. Well, that's really difficult and processor hungry, so a programmer wouldn't choose to do that unless there were no other options.
Also, because you can open multiple windows, the poker window can be hidden or partially obscured by other things on the screen, so you couldn't guarantee to be able to capture the screen.
In short, they read the log files that are output by the poker software.
When you install your HUD like Sharkscope or Jivaro etc, than they run client software on your PC. It reads the log files and updates its own servers with every hand you play.
Most poker software is similar, but lets start with Pokerstars, as thats where I play. The Poker software outputs to local log files for every action you/it makes. It shows your cards, any opponents cards that you see plus what you do. eg. which button you have pressed, how much you/they bet etc. It posts these updates in near real time and timestamps the log file.
You can look at your own files to see this in action.
On a PC do this (not sure what you do on a Mac, but will be similar)
1. Load File Explorer
2. Select VIEW from the menu
3. Select HIDDEN ITEMS so that you can see the hidden data files
4. Goto C:\Users\Dave\AppData\Local\PokerStars.UK (you may not be called DAVE...)
5. Open the PokerStars.log.0 file in NOTEPAD
6. In Notepad, SEARCH for updateMyCard
7. It will show your card numerically
3c for 3 of Clubs
14d for Ace of Diamonds
You can see your opponents cards only where you saw them at the table.
Here is a few example lines from the log file.
OnTableData() round -2
:::TableViewImpl::updateMyCard() 8s (0) [2A0498]
:::TableViewImpl::updateMyCard() 13h (1) [2A0498]
:::TableViewImpl::updatePlayerCard() 7s (0) [2A0498]
:::TableViewImpl::updatePlayerCard() 14s (1) [2A0498]
[2015/12/13 12:19:34]
cheers, hope this helps
Dave
I've thought about this, and have two theories:
The "sniffer" sites have every table open, AND:
Are able to pull the hand data from the network stream. (or:)
Are obtaining the hand data from the GUI (screen scraping, pulling stuff out via the GUI API).
Alternately, they may have developed/modified clients to log everything for them, but I think one of the above solutions is likely simpler.
Well, they have two choices:
they spider/grab the data without consent. Then they risk being shut down anytime. The poker site can easily detect such monitoring at this scale and block it. And even risk a lawsuit for breach of the terms of service, which probably disallow the use of robots.
they pay for getting the data directly. This saves a lot of bandwidth (e.g. not having to load the full pages, extraction, updates with html changes etc.) and makes their business much less risky (legally and technically).
Guess which one they more likely chose; at least if the site has been around for some time without being shut down every now and then.
I'm not sure how it works but I have an application id and a key- which you get as a gold or silver subscriber- sign up for a month and send them an email and you will get access and the API documentation.

How can I start with data mining for small grocery shop

My company got the project to build simple website of grocery shop with catalogue only without shop cart. Few days ago I read something about data mining from here
I found that it is possible to do some predictive modelling like
For example, one Midwest grocery chain used the data mining capacity of Oracle software to analyze local buying patterns. They discovered that when men bought
diapers on Thursdays and Saturdays, they also tended to buy beer.
I told them this example and they were happy if I can do something like that.
Now don't know how to start and where to start. I know mysql database and can program complex queries as well. But I don't know how i can get the type of data like beer and diapers
I have 3-4 months left. Can anyone guide me how i can start.
I also don't know what type of data of customer shopping i can get from the shop may be excel files .
But i want to start
Judging your question, you don't seem to know much, if anything, about data mining. That being said, you can get something usable running in 4 months, especially in a very restricted domain like a web shop, where all you are after is probably buying patterns for a start.
Please understand that you cannot expct some out-of-the-box solution that can be posted here in 10 lines of code, so I suggest you start by reading a decent book on the subject. I'd recommend:
Programming Collective Intelligence: Building Smart Web 2.0 Applications

What is data mining from a developer's perspective?

I can find the technical explanation of what data mining is in a book or on Wikipedia, but I'm wondering what sort of development does it exactly involve? Is it more about using tools or more about writing tools? Is it really any much different from other domains when it comes to R&D?
Data Mining is the process of discovering interesting patterns in large amounts of data. It is not querying data, which is just what user Treb describes (sorry Treb).
To understand DM from a developer's perspective, you should read the book Programming Collective Intelligence by Toby Segaran.
In my experience (I'm a former data miner :-)), it's a mixture of using tools and writing tools. A lot of the time, the tools you need to analyse the particular data set don't exist, so you have to write them yourself first. It can be very interesting but you often need quite a different approach to the sort of programming I do now (embedded wireless), for example.
You really ought to change the accepted answer on this question so it doesn't mislead those who come across it.
Saying that querying a database IS data mining because "[h]ow would you discover any pattern in your data without querying first?" is like saying opening your car door is driving because "how else would you be able to drive somewhere without opening the car door first."
You can read your data out of a text file if you want. My first data mining assignment used data sets from the UCI repository and those are almost all text files.
If you want to learn about data mining start by looking up clustering and classification. Learn about decision trees and rule based classification. Then look at k-nearest-neighbor and k-means. After that if you really want to see what data mining is all about look at Chameleon, DBScan, and Support Vector Machines. Don't necessarily learn the minutiae of the last three (they're pretty complex and math heavy) but understanding the abstract idea of what happens will tell you all you need to know in order to use the many tools and libraries that are available for each strategy.
These are only the algorithms that popped into my head just now. There are so many others that I don't recall or don't even know yet.
Data mining is about searching large quantities of data for hidden patterns. Web 2.0 example: News corp uses its site myspace.com as a large data mine to determine what movies and products to promote. They write software to identify trends in the data that it's users post to the site. News corp does this to gather information useful for advertising campaigns and market predictions. It's different from other domains of R&D in that from a data givers perspective its passive. Rather than going out on the street and asking people in person what movies they are likely to see this summer and other such questions, the data mining tools sort out these things by analyzing data given by users voluntarily.
Wikipedia actually does have a pretty good article on it:
- http://en.wikipedia.org/wiki/Data_mining
Data Mining as I say is finding patterns or trends from given data. A developer perspective might be in applications like Anti Money Laundring... Where given a pattern you will search data for that given pattern. One other use is in Projection Softwares... where you project a result or outcome in future against a heuristic by studying recognizing the current trend from data.
I think it's more about using off the shelf tools rather than developing your own. An academic example of that kind of tools might be WEKA. Of course, you still have to know what algorithms use, how to preprocess data (very important this part), etc.
In R&D I don't have much idea, but it should be like almost everything: maths, statistics, more maths...
On the development level, data mining is just another database application, but with a huge amount of data.
The mining itself is done by running specific queries on the database. It's in the creation of the queries where the important work is done. They of course depend on the data model, and on the hypotheses, what sort of trends the customer expects to find.
Therefore, the fine tuning of the queries usually can't be done in development, but only once the system is live and you have live data. Then the user can test his hypotheses and adapt the queries to show him the trends he is looking for.
So from a dev point of view, data maining is about
Managing large sets of data in your client (one query may return 100.000 rows of data)
Providing the user (who may know nothing about SQL or relational databases in general) with an effective way to modify his queries and view the results.