AutoML VISION Google SingleLabel Classification output TopK results - google-cloud-platform

Currently AutoML Vision API is outputting a SingleLabel with the respective Score
For example:
I trained the model with 3 classes:
A
B
C
Then when I am using Test & Use and I am uploading another image, I got only
[CURRENT OUTPUT]
Class A and 0.988437 / 0.99
Is there a way I can get this type of output with Top_K classes ( for example Top 3 (k=3) )
[DESIRED OUTPUT]
Class A and 0.988437 / 0.99
Class C and 0.3551 / 0.36
Class B and 0.1201 / 0.12
Sorted based on their Score.
Thanks in Advance.

Single-label classification assigns a single label to each classified image and it returns only one predicted class.
Multi-label is more suited for your use case as it allows an image to be assigned multiple labels.
In the UI (which is what you seem to be using) you can specify the type of classification you want your custom model to perform when you create your dataset.
If, for any reason, you would like to have the option to get all/k predicted classes scores on the single-label classification, I suggest that you raise a Feature Request.

Related

Training and Test Set in Weka InCompatible in Text Classification

I have two datasets regarding whether a sentence contains a mention of a drug adverse event or not, both the training and test set have only two fields the text and the labels{Adverse Event, No Adverse Event} I have used weka with the stringtoWordVector filter to build a model using Random Forest on the training set.
I want to test the model built with removing the class labels from the test data set, applying the StringToWordVector filter on it and testing the model with it. When I try to do that it gives me the error saying training and test set not compatible probably because the filter identifies a different set of attributes for the test dataset. How do I fix this and output the predictions for the test set.
The easiest way to do this for a one off test is not to pre-filter the training set, but to use Weka's FilteredClassifier and configure it with the StringToWordVector filter, and your chosen classifier to do the classification. This is explained well in this video from the More Data Mining with Weka online course.
For a more general solution, if you want to build the model once then evaluate it on different test sets in future, you need to use InputMappedClassifier:
Wrapper classifier that addresses incompatible training and test data
by building a mapping between the training data that a classifier has
been built with and the incoming test instances' structure. Model
attributes that are not found in the incoming instances receive
missing values, so do incoming nominal attribute values that the
classifier has not seen before. A new classifier can be trained or an
existing one loaded from a file.
Weka requires a label even for the test data. It uses the labels or „ground truth“ of the test data to compare the result of the model against it and measure the model performance. How would you tell whether a model is performing well, if you don‘t know whether its predictions are right or wrong. Thus, the test data needs to have the very same structure as the training data in WEKA, including the labels. No worries, the labels are not used to help the model with its predictions.
The best way to go is to select cross validation (e.g. 10 fold cross validation) which automatically will split your data into 10 parts, using 9 for training and the remaining 1 for testing. This procedure is repeated 10 times so that each of the 10 parts has once been used as test data. The final performance verdict will be an average of all 10 rounds. Cross validation gives you a quite realistic estimate of the model performance on new, unseen data.
What you were trying to do, namely using the exact same data for training and testing is a bad idea, because the measured performance you end up with is way too optimistic. This means, you‘ll get very impressive figures like 98% accuracy during testing - but as soon as you use the model against new unseen data your accuracy might drop to a much worse level.

Can we give the test data, without labelling them?

I came across this snippet in the Tensorflow documentation, MNIST For ML Beginners.
eval_data = mnist.test.images # Returns np.array
eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)
Now, I want to feed my own test images, without labelling them and would like the model to predict the labels, how do I achieve this?
Yes you can, but it would not be deep learning instead it would be clustering. ( Ex: K means Clustering )
Basic idea is like the following:
Create two placeholders for input and centroids
Decide a distance metric
Create graph
feed only dataset to run the graph

Empty confusion matrix in Weka with test data

I am classifying iris data using DECISION TREE (C4.5), RANDOM FOREST and NAIVE BAYES. I am using the dataset downloaded from iris-train and iris-test. When I train the all networks everything is fine with proper results with 'classifier output', 'Detailed accuracy with class' and 'confusion matrix'. But, when I select the iris-test data in the Weka-explorer-classify-test options and select the iris-test file and in 'more options' select 'output prediction' as 'csv' and click start, I am getting the result as shown in the figure below. The 'classifier output' is showing the classified samples correctly, but, 'Detailed accuracy with class' and 'confusion matrix' is with all values zeros. Any suggestion where I am going wrong in selecting any option. Thank you.
The confusion matrix shows you how well your trained classifier performs by comparing the actual class of the instances in the test set with the class that was predicted by the classifier. But you are supplying a test set with no class information, so there's nothing to compare against. This is why you see
Total Number of Instances 0
Ignored Class Unknown Instances 120
in the output in your screenshot.
Typically you would first evaluate the performance of your classifier using cross-validation, or a test set that has class information. Then you can use the trained classifier to classify unknown data, for example using the Re-evaluate model on current test set right-click option as described in the help.

Finding a correlation between variable and class variable

I have a dataset which contains 7 numerical attributes and one nominal which is the class variable. I was wondering how I can the best attribute that can be used to predict the class attribute. Would finding the largest information gain by each attribute be the solution?
So the problem you are asking about falls under the domain of feature selection, and more broadly, feature engineering. There is a lot of literature online regarding this, and there are definitely a lot of blogs/tutorials/resources online for how to do this.
To give you a good link I just read through, here is a blog with a tutorial on some ways to do feature selection in Weka, and the same blog's general introduction on feature selection. Naturally there are a lot of different approaches, as knb's answer pointed out.
To give a short description though, there are a few ways to go about it: you can assign a score to each of your features (like information gain, etc) and filter out features with 'bad' scores; you can treat finding the best parameters as a search problem, where you take different subsets of the features and assess the accuracy in turn; and you can use embedded methods, which kind of learn which features contribute most to the accuracy as the model is being built. Examples of embedded methods are regularization algorithms like LASSO and ridge regression.
Do you just want that attribute's name, or do you also want a quantifiable metric (like a t-value) for this "best" attribute?
For a qualitative approach, you can generate a classification tree with just one split, two leaves.
For example, weka's "diabetes.arff" sample-dataset (n = 768), which has a similar structure as your dataset (all attribs numeric, but the class attribute has only two distinct categorical outcomes), I can set the minNumObj parameter to, say, 200. This means: create a tree with minimum 200 instances in each leaf.
java -cp $WEKA_JAR/weka.jar weka.classifiers.trees.J48 -C 0.25 -M 200 -t data/diabetes.arff
Output:
J48 pruned tree
------------------
plas <= 127: tested_negative (485.0/94.0)
plas > 127: tested_positive (283.0/109.0)
Number of Leaves : 2
Size of the tree : 3
Time taken to build model: 0.11 seconds
Time taken to test model on training data: 0.04 seconds
=== Error on training data ===
Correctly Classified Instances 565 73.5677 %
This creates a tree with one split on the "plas" attribute. For interpretation, this makes sense, because indeed, patients with diabetes have an elevated concentration of glucose in their blood plasma. So "plas" is the most important attribute, as it was chosen for the first split. But this does not tell you how important.
For a more quantitative approach, maybe you can use (Multinomial) Logistic Regression. I'm not so familiar with this, but anyway:
In the Exlorer GUI Tool, choose "Classify" > Functions > Logistic.
Run the model. The odds ratio and the coefficients might contain what you need in a quantifiable manner. Lower odds-ratio (but > 0.5) is better/more significant, but I'm not sure. Maybe read on here, this answer by someone else.
java -cp $WEKA_JAR/weka.jar weka.classifiers.functions.Logistic -R 1.0E-8 -M -1 -t data/diabetes.arff
Here's the command line output
Options: -R 1.0E-8 -M -1
Logistic Regression with ridge parameter of 1.0E-8
Coefficients...
Class
Variable tested_negative
============================
preg -0.1232
plas -0.0352
pres 0.0133
skin -0.0006
insu 0.0012
mass -0.0897
pedi -0.9452
age -0.0149
Intercept 8.4047
Odds Ratios...
Class
Variable tested_negative
============================
preg 0.8841
plas 0.9654
pres 1.0134
skin 0.9994
insu 1.0012
mass 0.9142
pedi 0.3886
age 0.9852
=== Error on training data ===
Correctly Classified Instances 601 78.2552 %
Incorrectly Classified Instances 167 21.7448 %

Weka: Train and test set are not compatible

I'm trying to classify some web posts using weka and naive bayes classifier.
First I manually classified many posts (about 100 negative and 100 positive) and I created an .arff file with this form:
#relation classtest
#attribute 'post' string
#attribute 'class' {positive,negative}
#data
'RT #burnreporter: Google has now indexed over 30 trillion URLs. Wow. #LeWeb',positive
'A special one for me Soundcloud at #LeWeb ',positive
'RT #dianaurban: Lost Internet for 1/2 hour at a conference called #LeWeb. Ironic, yes?',negative
.
.
.
Then I open Weka Explorer loading that file and applying the StringToWordVector filter to split the posts in single word attributes.
Then, after doing the same with my dataset, selecting (in classify tab of weka) naive bayes classifier and choosing select test set, it returns Train and test set are not compatible. What can I do? Thanks!
Probably the ordering of the attributes is different in train and test sets.
You can use batch filtering as described in http://weka.wikispaces.com/Batch+filtering
I used batch filter but still have problem. Here is what I did:
java -cp /usr/share/java/weka.jar weka.filters.unsupervised.attribute.NumericToNominal -R last -b -i trainData.arff -o trainDataProcessed.csv.arff -r testData.arff -s testDataProcessed.csv.arff
I then get the error below:
Input file formats differ.
Later.I figured out two ways to make the trained model working on supplied test set.
Method 1.
Use knowledge flow. For example something like below: CSVLoader(for train set) -> classAssigner -> TrainingSetMaker -->(classifier of your choice) -> ClassfierPerformanceEvaluator - TextViewer. CSVLoader(for test set) -> classAssigner -> TestgSetMaker -->(the same classifier instance above) -> PredictionAppender -> CSVSaver. Then load the data from the CSVLoader or arffLoder for the training set. The model will be trained. After that load data from the loader for the test set. It will evaluate the model(classifier, for example) on the supplied test set and you can see the result from the textviewer (connected to the ClassifierPerformanceEvaluator) and get the saved result from the CSVSaver or arffSaver connected to the PredictionAppender.An additional column, the "classfied as" will be added to the output file. In my case, I used "?" for the class column in the supplied test set if the class labels are not available.
Method 2.
Combine the Training and Test set into one file. Then the exact same filter can be applied to both training and test set. Then you can separate training set and test set by applying instance filter. Since I use "?" as class label in the test set. It is not visible in the instance filter indices. Hence just select those indices that you can see in the attribute values to be removed when apply the instance filter. You will get the test data left only. Save it and load it in supply test set at the classifier page.This time it will work. I guess it is the class attribute that causes the NOT compatible train and test set issue. As many classfier requires nominal class attribute. The value of which is converted to the index to available values of the class attribute according to http://weka.wikispaces.com/Why+do+I+get+the+error+message+%27training+and+test+set+are+not+compatible%27%3F