I'm implementing some simple machine learning algorithms on some financial data in c++, and would like to be able to present this in a 'professionel' way to a potential customer.
Does anyone know a good framework for displaying financial charts?
Or is there a simple way to do something else like embed gnuplot in a qt widget?
If your customer is in finance, speak to them on their terms. Financial people do things in Excel and Powerpoint. Write your data in comma-separated value format, import this into Excel, create some Excel plots, and pull this into a Powerpoint presentation.
You might think of Excel and Powerpoint as being beneath someone who can develop machine learning techniques. Don't think that way. You are trying to sell a product, you need to speak in the customer's lingo, not your's.
And do check for spelling errors in your presentation. 'Professionel' presentations do not have misspelled words.
Related
I am going to develop a generic C++/Qt GUI tool for data I/O from the user.
The data will be directed to/from the core application through a file.
However the same task could be performed by a spreadsheet. The only doubt I have is whether spreadsheets can save/load only the data that have changed since the last save/load operation, even in a temporary file.
I would like to know if this is a common feature among spreadsheets (especially the open source ones).
Do you mean with spreadsheet a software like Excel or or OpenOffice calc? That's a big difference to a customized Qt application. I am sure you can do this with Excel or OpenOffice calc. To decide the way to go it is more important which other requirements you have. Who should use the application and for which purpose? Do you know the neccessary programming languages/frameworks? Which functions should it implement?
Without a LOT more details you will not get a good answer here.
It seems that with spreadsheets (e.g. Excel, LibreOffice Calc, ...) it is not possible to save/load portions of the project, not even in a temporary delta file.
For these tasks, a database is the tool to use.
I need to create a window application that has an excel grid that users can enter data into, via keyboard or cut and paste. I would like to be able to expand and contract it in both axes on the fly. I'm just starting out programming windows applications, so any pointers to examples or keywords that I can refine my search with, would be extremely helpful.
Thanks,
James
Take a look at The Ultimate Grid. It has lots of features.
EDIT:
It used to be a commercial product, but it was later open sourced
If you are using MFC, take a look at here for data grid control. I've used it several times and it did the job.
The requirement is to build a calculation engine which is performant and supports excel like formulas. These formulas need to be applied on huge data sets (millions of rows of data).
I was thinking if something could be built on top of OpenOffice Calc service and make it available as a Calculation Engine.
Does anyone have any experience in doing this ? Are there any other alternatives ? I know it is possible using Excel service but we are an Open Source shop. M$ is ruled out.
Any pointers would be very helpful.
Edited based on High Performance Mark's inputs.
Numerics calculations are needed. Scientific calculations are not in scope (ie., Sin(x), tanh(x) etc)
Calculation are not performed by end users. The formulas are stored in the DB and applied on the datasets. The formulas (like tax calculation) are configured. So if the formula changes, recalculation will be triggered via the application.
spreadsheet like formulas are well understood by wider audience and should be easier to read and maintain. Is there any wrapper around R (or such equivalent) that will convert spreadsheet formula into R syntax ?
Well, a little Googling finds several open-source Java-written spreadsheets, one of which may be suitable for your purposes. One of the questions you might want to answer, maybe edit your post, would be what calculations do you want to perform -- the full set of functionality that Excel provides (or something close) or would the facilities that SQL provides satisfy your requirements ? If so, then you might want to database this.
Another question you might clarify is this: are you trying to create an application which like Excel is usable by end-users for specifying calculations ? But, unlike Excel, is based on open-source software and can cope with millions of rows. I don't know about its performance on such large data sets, someone else on SO can probably tell us, but R is very popular (and rightly so) for what you are probably trying to do. My view is that R sits between the average programming languages (say Python) and the average spreadsheet (say Excel) in terms of ease-of-use-by-non-programmers.
Your choice of solution may (and certainly ought to) depend on who will be using it.
I have recently become interested in the field(s) of data mining and machine learning. The idea of going through huge datasets and trying to correlate hidden patterns and trends is fascinating. So far I have done the following
Used Weka to load simple data sets and generate decision trees
Continously read books, wiki's, blogs and SO on the same
Started playing around SQL Server DM and Python API's
Have an idea on options of freely available data sets on the web(freedb, UN etc)
What is hindering me is the minute I try to go beyond classification/associsciation and into priori/apriori algorithms I am stuck because understanding mathematical equations and logic is not(to put it modestly) one of my strong points.
So my question would be are there anybody in the Data mining field(in the role of product owner or builder) who are not naturally mathematicians? If so, how would you approach in undestanding the field since free tools like Weka and Rapid-miner both expects some mathematical/statistical background?
P.S: Excuse me if I made some mistake in the query like mixing Data mining and analytics when they are separate as I am still getting my feet wet. I hope my core question is clear.
Well, being able to do some analysis of what the data mining models are showing is absolutely vital. However, these days all of the math and statistics are taken care of by the data mining models. You don't need to understand the math behind them (although it helps).
For example, you can look through the SQL Server Analysis Services Data Mining Algorithms and see that even the technical reference is how to use these implementations, not how to recreate them.
If you can understand the business cases and you can understand what the data mining is telling you, there's really no need to delve into the math behind it.
As for some of the free tools, I've never used them, so I can't speak to them. However, I'm a big fan of SSAS and those data mining models, which don't require an extensive mathematical background.
As Eric says, and as far as you only intend to use the existing algorithms and APIs and make sense from them, I don't see problems with the required math/statistics skill set (anyway, you'll need some previous basic knowledge/level).
Now, if you intend to do research or if you want to improve or modify existing algorithms, or why not, create your own algorithms, then math and statistics is a MUST. I just started doing some research in this area, and I'm still trying to fill my skills gap =)
I can find the technical explanation of what data mining is in a book or on Wikipedia, but I'm wondering what sort of development does it exactly involve? Is it more about using tools or more about writing tools? Is it really any much different from other domains when it comes to R&D?
Data Mining is the process of discovering interesting patterns in large amounts of data. It is not querying data, which is just what user Treb describes (sorry Treb).
To understand DM from a developer's perspective, you should read the book Programming Collective Intelligence by Toby Segaran.
In my experience (I'm a former data miner :-)), it's a mixture of using tools and writing tools. A lot of the time, the tools you need to analyse the particular data set don't exist, so you have to write them yourself first. It can be very interesting but you often need quite a different approach to the sort of programming I do now (embedded wireless), for example.
You really ought to change the accepted answer on this question so it doesn't mislead those who come across it.
Saying that querying a database IS data mining because "[h]ow would you discover any pattern in your data without querying first?" is like saying opening your car door is driving because "how else would you be able to drive somewhere without opening the car door first."
You can read your data out of a text file if you want. My first data mining assignment used data sets from the UCI repository and those are almost all text files.
If you want to learn about data mining start by looking up clustering and classification. Learn about decision trees and rule based classification. Then look at k-nearest-neighbor and k-means. After that if you really want to see what data mining is all about look at Chameleon, DBScan, and Support Vector Machines. Don't necessarily learn the minutiae of the last three (they're pretty complex and math heavy) but understanding the abstract idea of what happens will tell you all you need to know in order to use the many tools and libraries that are available for each strategy.
These are only the algorithms that popped into my head just now. There are so many others that I don't recall or don't even know yet.
Data mining is about searching large quantities of data for hidden patterns. Web 2.0 example: News corp uses its site myspace.com as a large data mine to determine what movies and products to promote. They write software to identify trends in the data that it's users post to the site. News corp does this to gather information useful for advertising campaigns and market predictions. It's different from other domains of R&D in that from a data givers perspective its passive. Rather than going out on the street and asking people in person what movies they are likely to see this summer and other such questions, the data mining tools sort out these things by analyzing data given by users voluntarily.
Wikipedia actually does have a pretty good article on it:
- http://en.wikipedia.org/wiki/Data_mining
Data Mining as I say is finding patterns or trends from given data. A developer perspective might be in applications like Anti Money Laundring... Where given a pattern you will search data for that given pattern. One other use is in Projection Softwares... where you project a result or outcome in future against a heuristic by studying recognizing the current trend from data.
I think it's more about using off the shelf tools rather than developing your own. An academic example of that kind of tools might be WEKA. Of course, you still have to know what algorithms use, how to preprocess data (very important this part), etc.
In R&D I don't have much idea, but it should be like almost everything: maths, statistics, more maths...
On the development level, data mining is just another database application, but with a huge amount of data.
The mining itself is done by running specific queries on the database. It's in the creation of the queries where the important work is done. They of course depend on the data model, and on the hypotheses, what sort of trends the customer expects to find.
Therefore, the fine tuning of the queries usually can't be done in development, but only once the system is live and you have live data. Then the user can test his hypotheses and adapt the queries to show him the trends he is looking for.
So from a dev point of view, data maining is about
Managing large sets of data in your client (one query may return 100.000 rows of data)
Providing the user (who may know nothing about SQL or relational databases in general) with an effective way to modify his queries and view the results.