Get the source code of FP Growth algorithm used in WEKA to see how it is implemented - weka

I am currently working on a project that involves FP-Growth and I have no idea how to implement it. Is the source code of FP-Growth used in WEKA available anywhere so i can study the working?

Weka is indeed Open Source Software (OSS), and their source code is freely available via SVN hosted by the University of Weikato: http://www.cs.waikato.ac.nz/ml/weka/svn.html
To find a specific implementation, I would search the Weka Java Docs on SourceForge to identify the class: http://weka.sourceforge.net/doc.stable/. (Here is FP-Growth). Note the class hierarchy beneath the class name:
Take that class hierarchy and locate it in SVN by traversing the package names in the version that you want:
Click on the link in SVN to open or download the source code. Here is the link for FP-Growth: https://svn.cms.waikato.ac.nz/svn/weka/tags/weka-stable-3.6.13/src/main/java/weka/associations/FPGrowth.java (for Weka 3.6.13).

You could have a look at the version of FP-Growth implemented in the SPMF data mining library (I'm the founder), which is specialized in pattern mining and offers FPGrowth, and many other algorithms. It is implemented in Java, and it is very easy to reuse it, it is optimized, and unlike some other implementations it has no dependencies to other libraries.

Related

Compare multiple files for common code

I have two projects each with a massive code base. I'd like to run a tool to go through all the files in every project and show me which files across the projects have similar code. I'm not even sure if anything like this exists but I remember been in school, teachers had a tool they ran on all code from multiple students to identify how similar their code was (to catch cheaters).
What you want is a clone detection tool. These tools find code which duplicated across any set of files. For your task, you'd take the files for both projects, and do clone detection across that set.
[EDIT 2019 based on real experience doing exactly what OP wants to do].
If a clone is found in a file from one project, that corresponds to a clone found in a file from the other project, you've found what they have in common.
A defect of doing straight clone detection across all files from both projects, is that you will find a lot of clones from one project into that same project. Those aren't interesting according to your question, e.g. false positives.
My company provides a commercial clone detector called CloneDR. It is (IMHO) an extremely good detector and will find clones that other detectors cannot (e.g. it isn't fooled by comment changes, code layouts, number radixes, variable rename nor even insertion or deletion of code fragments). But it has one other very nice property: it has a option to detect clones only across two project code bases. You won't get the false positives you'd get by treating the two projects as one.
Are you thinking of something like WinMerge? That can compare entire directory trees worth of files.
Many editors have side by side comparison tools. These are liked embedded versions of WinMerge. Notepad++ and SublimeText2 come to mind.

Software for browsing classes in a C/C++ project

I have seen that some developers have a graphical representation of all their classes in an image file which comes with their project. How can I myself create these graphics?
Basically they show what classes exist in a file and how these files relate to each other.
Thanks
Doxygen allows you to generate interactive class diagrams. Check out this page: http://www.doxygen.nl/manual/diagrams.html how to set it up.
It might be Doxygen, a software that generates documentation with dependancy graphs.
Eclipse has the ability to generate class diagrams and export them as images.
One common tool that allows people to do that fairly easily is Dot, which comes as part of Graphviz. Dot is sort of a markup language describing graphs. You can generate Dot files manually if you like, but there are a lot of code analysis tools that will do the job for you. Doxygen is one.
Another great tool for creating UML class diagrams is Dia.

C/C++ Code Examples with HTK (Hidden Markov Toolkit)

I am trying to get started with HTK, I grabbed a copy, compiled it, grabbed the book, and all went more or less fine, little troubles here and there but nothing serious.
Now after reading the book and googling quite a while, I do not see any documentation for the essential part for me: HTKLib. Everything is described into the smallest detail for all HTK tool programs (scriptable command line interface tools) but I cannot find a single example or tutorial how to actually call the lib.
Could anyone point me into a direction?
The source code for the respective tools is included, but it would be rather cumbersome to have to extract the information for a reputable library by reading the source code... I would have expected a little more documentation , but maybe I simply overlooked it?
Any help is deeply appreciated,
Tom
edit:
I was trying to use HTK for computer vision purposes, not for NLP, and for that I required that I could link against it, and call it from within my code. Thanks for your replies.
Maybe ATK is more suitable for you. Here is the explantation from the ATK site:
"ATK is an API designed to facilitate building experimental applications for HTK. It consists of a C++ layer sitting on top of the standard HTK libraries."
In addition Microsoft Research has another research tool here for training acoustic models. This includes a set visual project for HTKlib and a set of C++ HTK wrappers, but it may only include a subset of the HTK functionality and has licence restrictions.
I have not used it but use I the language modeling toolkit. I think the main intention is to use the command line tools provided. I imagine they are very flexible tools that will enable you to build and test models. Why do you want to use the code?
Also what are you trying to do?

C++ code to class diagram

Is there is a way I can generate a hierachial class diagram from C++ code. My code is spread over 5 to 6 .cpp files.
I would like to know if there is any free tool for the same.
Regards,
AJ
There's e.g. doxygen
http://www.doxygen.nl/manual/features.html says:
Uses the dot tool of the Graphviz tool kit to generate include dependency graphs, collaboration diagrams, call graphs, directory structure graphs, and graphical class hierarchy graphs.
It creates graphs like
(from http://www.vtk.org/doc/nightly/html/structvtkKdTree_1_1__cellList.html, an example listed on the doxygen site)
Since the question was about class diagrams you might also be interested in the UML_LOOK flag that makes the ouput a bit more uml-like.
Class diagrams are networks, not hierarchies. There a re quite a few tools that can generate them - my favourite is Enterprise Architect, but it isn't free (there is a trial).
Umberello is the Linux application that generate diagram from code.
Doxygen can create class-diagrams. However, I believe these diagrams are only to show the network of classes, they do not list methods and members and such.

Create class diagram from c++ source?

Is there any free tools available for generating class diagram from c++ source files and if possible for mfc source files too.
We use doxygen with graphviz support
You could try SourceNavigator. I'm not sure what the current state of the project is, but here's a place to start.
I've had some success with Umbrello (a KDE-based app). It allows you to import code to create a model, that can then be used to generate UML diagrams.
Umbrello is probably fine for projects with a limited number of classes, and certainly requires manual intervention for tuning. I imagine doxygen/graphviz is more suitable for larger projects.