Association Mining Weka Only 1 - weka

I'm trying to apply associate mining using apriori with Weka on my data set that looks like
A B C
1 0 1
0 0 1
1 0 0
But it's only finding rules where its 0 while I only want rules where there are 1s
How can I get around this? I don't want it to look for rules where an absence of something indicates the absence of something else but rather the presence of A to indicate the presence of C for example.

Try replacing 0s with missing values instead! If I recall correctly, this will then produce the desired results. But I haven't used this for a long time, because Weka is just so much slower than ELKI or SPMF. Weka would just die on my data sets, whereas the other two worked fine.

Related

Is it possible to use clickhouse to implement efficient union-find algorithm?

I have typical union-find problem where I have to group records, but it includes multiple files of hundreds bilion of records.
Can I somehow use clickhouse database to solve it?
Edit - minimal reproducible example:
I have tree columns (item_id, from, to) which represent graph nodes.
I want to create groups (id, group_id, item_id) which names groups from disjoint sets.
[Data]
item_id from to
0 101 102
1 102 103
2 104 105
[Result]
id group_id item_id
0 0 0
1 0 1
2 1 2
There are only two groups #0 (101->102->103) and #1 (104->105).
The problem in implementation in memory is that there's too much records and I want clickhouse (or some other solution) to care about filesystem caches.
Without knowing more about your specific data and questions, it is tricky to provide a definitive answer. In general, this represents a moderate size for ClickHouse. UNION is fully supported. Your best bet is to simply try - loading data or generating data is straightforward and SQL queries can usually be translated from Postgresql/MySQL easily.

Caffe SoftmaxWithLoss Error

I get this error message when I try to solve my neural network:
Check failed: label_value < prob_.shape(softmax_axis_) (1 vs. 1)
My labels are all either 0 or 1. When I tried out this example it worked with 0 and 1 labels. So my assumption is that the error is in the second part:
prob_.shape(softmax_axis_)
I looked it up in the source code and I don't understand how my source code or prototxt files influence this value.
Can someone explain what is going on and how I can get my softmax layer to accept labels with a value of 1 ?
When using "SoftmaxWithLoss" layer to predict binary labels, your "class-probability" vector should by of length 2 (and not 1).
You are getting an error saying your "class-probability" vector (aka "prob_") is of dimension 1 while it should be at least 2 (that is strictly larger than largest label).
Check num_output parameter in the layer producing the class probabilities.
Alternatively, for binary classification, consider using "SigmoidCrossEntropyLoss"

Using cascade correlated neural networks (retraining)

I have a problem that I would like to solve using neural networks. I have a basic understanding of how cascade correlated networks work, but I am not sure if I can use them in an example without complete retraining.
For example say I want to train a XOR example, but I only have the first three triplets of inputs/outputs:
0 0 0
0 1 1
1 0 1
I understand how to train the network for these inputs/outputs, but say I want to add a fourth triplet:
1 1 0
without completely retraining the whole network. If I understand the algorithm correctly it should be possible, but I haven't found an appropriate C++ library or MATLAB toolbox that implements this.
I don't know any implementations, but people have made online versions of cascade correlation - which means constantly updating the network with new training data as it comes in, rather than training once on a static dataset.
I'm not sure how that works. I believe they just add new neurons every so often. You could also backprop through the whole thing like it was a normal neural network.

Simple Curve Fitting Implimentation in C++ (SVD Least Sqares Fit or similar)

I have been scouring the internet for quite some time now, trying to find a simple, intuitive, and fast way to approximate a 2nd degree polynomial using 5 data points.
I am using VC++ 2008.
I have come across many libraries, such as cminipack, cmpfit, lmfit, etc... but none of them seem very intuitive and I have had a hard time implementing the code.
Ultimately I have a set of discrete values put in a 1D array, and I am trying to find the 'virtual max point' by curve fitting the data and then finding the max point of that data at a non-integer value (where an integer value would be the highest accuracy just looking at the array).
Anyway, if someone has done something similar to this, and can point me to the package they used, and maybe a simple implementation of the package, that would be great!
I am happy to provide some test data and graphs to show you what kind of stuff I'm working with, but I feel my request is pretty straightforward. Thank you so much.
EDIT: Here is the code I wrote which works!
http://pastebin.com/tUvKmGPn
change size to change how many inputs are used
0 0
1 1
2 4
4 16
7 49
a: 1 b: 0 c: 0
Press any key to continue . . .
Thanks for the help!
Assuming that you want to fit a standard parabola of the form
y = ax^2 + bx + c
to your 5 data points, then all you will need is to solve a 3 x 3 matrix equation. Take a look at this example http://www.personal.psu.edu/jhm/f90/lectures/lsq2.html - it works through the same problem you seem to be describing (only using more data points). If you have a basic grasp of calculus and are able to invert a 3x3 matrix (or something nicer numerically - which I am guessing you do given you refer specifically to SVD in your question title) then this example will clarify what you need to do.
Look at this Wikipedia page on Poynomial Regression

Calculating and correcting bit errors using Hidden Markov Model in C++

I am a student and new to C++. It would be great if somebody could help me write a program. Here is what it is suppose to do;
1- generates a Pseudo Random Bit Sequence (PN9 or PN15) 9 or 15 bit length.
2- saves the bit sequence from step 1 into a bit array/buffer and displays the array.
3- calculates the transition probabilities for
1. 0 --> 0
2. 0 --> 1
3. 1 --> 0
4. 1 --> 1
4- asks the user to input a bit sequence
5- introduces some noise.. i.e. flips some of the input bits
6- calculates and corrects the BIT errors based on the transition probabilities calculated in step-3
can any body guide or share his work with me on this?
You'll probably find many of the functions in the GNU Scientific Library (GSL) useful for your work. Apart from that, you'll have to do some work and then ask a specific question to get guidance.