Confusion regarding Hidden Markov Model and Conditional Random Fields - hidden

I am a bit confused about Hidden Markov Models and Conditional Random Fields. I wanna know id they are supervised or un-supervised learning methods?
Thanks

Well as I read several papers, they are both supervised methods and they need a labeled training set to be trained on.

Neither. They're models for the underlying representation of knowledge. What happens during training is that certain transitions, being reinforced, become higher probability.

Related

How do I use an MXNet model in C++?

After I have trained a model, how do I use it with C++?
I have tried MXNet incubator-mxnet/example/image-classification/predict-cpp/
and incubator-mxnet/cpp-package/example/.
As part of training you should periodically evaluate your model against a validation set, at the end of each epoch for example. You should then have a good idea of the expected accuracy of the model when using the model to score new data, to determine if the model is really performing worse than expected at inference time.
If the validation accuracy of the model while training the model is no better than random (i.e. 1/number of classes), there could be many reasons for this including; poor model selection, incorrect loss calculation, wrong optimization technique and hyperparameters (e.g. learning rate).
If the test accuracy of the model on unseen data is poor, you might be trying to apply the model to a different domain to which it was trained. You can't use a model trained on handwritten characters (e.g. MNIST) to classify real world objects (e.g. ImageNet).
If you need a C++ example of model training, take a look at this tutorial.

How to do prediction with weka

i'm using weka to do some text mining, i'm a little bit confused so i'm here to ask how can i ( with a set of comments that are in a some way classified as: notes, status of work, not conformity, warning) predict if a new comment belong to a specific class, with all the comment (9551) i've done a preprocess obtaining with the filter "stringtowordvector" a vector of tokens, and then i've used the simple kmeans to obtain a number of cluster.
So the question is: if a user post a new comment can i predict with those data if it belong to a category of comment?
sorry if my question is a little bit confused but so am i.
thank you
Trivial Training-validation-test
Create two datasets from your labelled instances. One will be training set and the other will be validation set. The training set will contain about 60% of the labelled data and the validation will contain 40% of the labelled data. There is no hard and fast rule for this split, but a 60-40 split is a good choice.
Use K-means (or any other clustering algorithm) on your training data. Develop a model. Record the model's error on training set. If the error is low and acceptable, you are fine. Save the model.
For now, your validation set will be your test dataset. Apply the model you saved on your validation set. Record the error. What is the difference between training error and validation error? If they both are low, the model's generalization is "seemingly" good.
Prepare a test dataset where you have all the features of your training and test dataset but the class/cluster is unknown.
Apply the model on the test data.
10-fold cross validation
Use all of your labelled data instances for this task.
Apply K-means (or any other algorithm of your choice) with a 10-fold CV setup.
Record the training error and CV error. Are they low? Is the difference between the errors is low? If yes, then save the model and apply it on the test data whose class/cluster is unknown.
NB: The training/test/validation errors and their differences will give you an "very initial" idea of overfitting/underfitting of your model. They are sanity tests. You need to perform other tests like learning curves to see if your model overfits or underfits or perfect. If there appears to be an overfitting and underfitting problem, you need to try many different techniques to overcome them.

Data mining with Weka

I am learning how to do data mining and I am using this data set from UCI's website.
http://archive.ics.uci.edu/ml/datasets/Forest+Fires
The problem I am encountering is how to deal with the area class. My understanding from the description is that I need to apply ln(x+1) to area using AddExpression.
Am I going in the correct direction with this? Or are there other filters I should investigate? Thank you.
I try to answer your question based on the little information you provide. And I haven't worked with the forest-fires data set, but by inspection I see that the classifier attribute "area" often has the value 0. Maybe you can't simply filter out these rows with Area = 0. Your dataset might become too small, or whatnot.
I think you are asked to perform regression of some attribute(s) against "log(area)" in order to linearize it. However,when you try to calculate the log of the Area, values such as log(0) are a problem. values between 0 and 1 might also be problematic.
So a common fix is to add 1 to the value of "Area". This introduces a systematic error, but it is small, and it removes all 0-values, and you can still derive useful models from your log(x+1)-transformed dataset.
And yes, in Weka you do this by "Preprocess"/ AddExpression(x+1). This creates a new attribute. Then you might remove the old area attribute.
Of course, in interpreting your model, you should be aware of the transformation. If you just want to find out what the significant independent attributes are in your linear regression model, I'd say the transformation does not matter. The data points are just shifted a little bit.

10-fold cross-validation in Weka

I am a bit confused as to the difference between 10-fold cross-validation available in Weka and traditional 10-fold cross-validation.I understand the concept of K-fold cross-validation, but from what I have read 10-fold cross-validation in Weka is a little different.
In Weka FIRST, a model is built on ALL data. Only then is 10-fold cross-validation carried out. In traditional 10-fold cross-validation no model is built beforehand, 10 models are built: one with each iteration (Please correct me if I'm wrong!). But if this is the case, what on earth does Weka do during 10-fold cross-validation? Does it again make a model for each of the ten iterations or does it use the previously assembled model. Thanks!
As far as I know, the cross-validation in Weka (and the other evaluation methods) are only used to estimate the generalisation error. That is, the (implicit) assumption is that you want to use the learned model with data that you didn't give to Weka (also called "validation set"). Hence the model that you get is trained on the entire data.
During the cross-validation, it trains and evaluates a number of different models (10 in your case) to estimate how well the learned model generalises. You don't actually see these models -- they are only used internally. The model that is shown isn't evaluated.

Regression Tree Forest in Weka

I'm using Weka and would like to perform regression with random forests. Specifically, I have a dataset:
Feature1,Feature2,...,FeatureN,Class
1.0,X,...,1.4,Good
1.2,Y,...,1.5,Good
1.2,F,...,1.6,Bad
1.1,R,...,1.5,Great
0.9,J,...,1.1,Horrible
0.5,K,...,1.5,Terrific
.
.
.
Rather than learning to predict the most likely class, I want to learn the probability distribution over the classes for a given feature vector. My intuition is that using just the RandomForest model in Weka would not be appropriate, since it would be attempting to minimize its absolute error (maximum likelihood) rather than its squared error (conditional probability distribution). Is that intuition right? Is there a better model to be using if I want to perform regression rather than classification?
Edit: I'm actually thinking now that in fact it may not be a problem. Presumably, classifiers are learning the conditional probability P(Class | Feature1,...,FeatureN) and the resulting classification is just finding the c in Class that maximizes that probability distribution. Therefore, a RandomForest classifier should be able to give me the conditional probability distribution. I just had to think about it some more. If that's wrong, please correct me.
If you want to predict the probabilities for each class explicitly, you need different input data. That is, you would need to replace the value to predict. Instead of one data set with the class label, you would need n data sets (for n different labels) with aggregated data for each unique feature vector. Your data would look something like
Feature1,...,Good
1.0,...,0.5
0.3,...,1.0
and
Feature1,...,Bad
1.0,...,0.8
0.3,...,0.1
and so on. You would need to learn one model for each class and run them separately on any data to be classified. That is, for each label you learn a model to predict a number that is the probability of being in that class, given a feature vector.
If you don't need the probabilities to be predicted explicitly, have a look at the Bayesian classifiers in Weka, which make use of probabilities in the models that they learn.