How to manually create a decision tree in Weka - weka

I would like to create my own decision tree model in Weka. In other words, I would like to manually specify all the splits and all the split values in the decision tree, without training any of the decision tree algorithms (e.g. REPTree, J48, etc.) on data. Is this possible in the Weka GUI or through Weka's Java API? If so, how?

Not that I'm aware of. You will need to create your own classifier, with the relevant options that define the splits that you want to use (or point to a file that contains that information for you to parse).

Related

Can we do analytics on masking data using WEKA tool?

I just want to ask if there any possibilities to do analytics on masking data using the WEKA tool. In this case, not all data fields will be masking. Only a few of them.
Thank you in advance!
If you mean by masking whether an attribute should be used by an algorithm for learning, then the answer is yes (in a sense).
However, rather than flagging attributes whether to be included or not, in Weka you'd simply remove them.
For example, for classification you would use the FilteredClassifier meta-classifier in conjunction with the actual base classifier that you want to use and the Remove filter.
If you need additional filters to applied (e.g., Normalize), then simply define a filter pipeline using MultiFilter as the filter in your FilteredClassifier setup. With that approach you can work with your original data and the base classifier only sees the data your filter pipeline outputs.

Decoupling GUI from a parsing engine

I am writing a parser for a hugely complex internal file format - for training purposes. I need to implement some sort of DSL for parsing - and then display the sections of format in a GUI, as some sort of a tree view. I want to be able to have the parsing engine in one process and the UI in another. So for example I want the UI to ask the parser engine to parse a file. The parser then returns some sort of tree containing sections and fields in the tree, and the GUI then displays it.
How do I make them communicate? (The language is C++). Do I make the engine a DLL and export the needed functions or how else to implement this. I cannot use external libraries.
Not a big problem. The interaction is one way, that helps a lot. Define an output format for the parser/input format for the UK, before writing either. If the format is well defined, you can even have two developers write the two parts independently.

Representing an AVL Tree graphically

I implemented and AVL tree using C++, at the moment I print the AVL tree to the console but I need to represent the tree using GUI as part of an application the user can use to interact with the tree. what libraries etc. should I look into in order to achieve this?
Note: I'm using OS X
The point here seems to be that some kind of user interaction is expected.
What kind of operations shall the user be able to invoke? Moving nodes, inserting, deleting?
You can go for the graphviz approach, but if you want to have user interaction, then for graphviz you should go for html output. That way you can e.g. associate nodes with clickable links where you can put some operation logic behind.
If that is not sufficient, then you will need to go for a generic GUI framework, and see what kind of libraries are available.
In case of C++, Qt is one thing to look into. There is something called a treeview that might fit to your problem (see e.g. here: http://doc.qt.digia.com/qt/qtreeview.html).
However, be prepared that it will take you some time to get into Qt.
graphviz is a graph visualization toolkit. Writing graphviz files is really simple and using one of the back-ends to spew out an image, too. You can then display those images with whatever toolkit you like.
graphviz could do the work.
And here is the document.

How do I create levels for my puzzle game ? Obj-C & Cocos2d

I want to create levels in my cocos2d game and I do not know how to do that with .plist files ... I searched the Internet but unfortunately I couldn't find significant information on how to implement these property lists. Can you please help out ?
Check out Tiled Map Editor. Tiled's TMX format is supported by Cocos2D.
As with any Apple technologies, the first place you should start searching for is the developer.apple.com website. In this case, here's the Property List (plist) Programming Guide.
However, I find property lists very awkward to work with, specifically if you want to create them manually and whenever they contain more than just a few entries. It certainly can't hurt to evaluate rolling out your own file format, text-based plain and simple. I would always rather work with simple text files like these rather than messing with property lists:
X=10;Y=10;Tile=30;
X=12;Y=11;Tile=28;
X=16;Y=19;Tile=22;
It's a different story if you actually design the data with a tool or within an app, where you'll be able to make use of the various collection convenience methods that save and load property lists, for example to and from a dictionary or array.

Best approach for doing full-text search with list-of-integers documents

I'm working on a C++/Qt image retrieval system based on similarity that works as follows (I'll try to avoid irrelevant or off-topic details):
I take a collection of images and build an index from them using OpenCV functions. After that, for each image, I get a list of integer values representing important "classes" that each image belongs to. The more integers two images have in common, the more similar they are believed to be.
So, when I want to query the system, I just have to compute the list of integers representing the query image, perform a full-text search (or similar) and retrieve the X most similar images.
My question is, what's the best approach to permorm such a search?
I've heard about Lucene, Lemur and other indexing methods, but I don't know if this kind of full-text searchs are the best way, given the domain is reduced (only integers instead of words).
I'd like to know about the alternatives in terms of efficiency, accuracy or C++ friendliness.
Thanks!
It sounds to me like you have a vectorspace model, so Lucene or a similar product may work well for you. In general, an inverted-index model will be good if:
You don't know the number of classes in advance
There are a lot of classes relative to the number of images
If your problem doesn't fit these criteria, a normal relational DB might work better, as Thomas suggested. If it meets #1 but not #2, you could investigate one of the "column oriented" non-relational databases. I'm not familiar enough with these to tell you how well they would work, but my intuition is that you'll need to replicate a lot of the functionality in an IR toolkit yourself.
Lucene is written in Java and I don't know of any C++ ports. Solr exposes Lucene as a web service, so it's easy enough to access it that way from whatever language you choose.
I don't know much about Lemur, but it looks like it has a similar vectorspace model, and it's written in C++, so that might be easier for you to use.
You can take a look at Lucene for image retrieval (LIRE) here: http://www.semanticmetadata.net/2006/05/19/lire-lucene-image-retrieval-04-released/
If I'm mistaken, you are trying to implement a typical bag of words image retrieval am I correct? If so you are probably trying to build an inverted file index. Lucene on its own is not suitable as you probably have already realized as it index text instead of numbers. Using its classes for querying the index would also be a problem as it is not designed to "parse" (i.e. detect keypoints, extract descriptors then vector-quantize them) image into the query vector.
LIRE on the other hand have been modified to index feature vectors. However, it does not appear to work out of the box for bag of words model. Also, I think I've read on the author's website that it currently uses brute force matching rather than the inverted file index to retrieve the images but I would expect it to be easier to extend than Lucene itself for your purposes.
Hope this helps.