WEKA software analysis for calculating total accuracy - data-mining

Hi i am using WEKA to analyze some data. But i am having problem how to calculate the total accuracy from the output data.
The partial output is bellow
Detailed Accuracy By Class
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.85 0.415 0.794 0.85 0.821 0.762 tested_negative
0.585 0.15 0.676 0.585 0.627 0.762 tested_positive
Weighted Avg. 0.758 0.323 0.753 0.758 0.754 0.762
From the above what will the total accuracy?

What's your confusion matrix from WEKA output?
In general case, it is necessary to know it to calculate accuracy.
And yes, I think "total accuracy" in this case means this accuracy:
http://upload.wikimedia.org/math/8/5/f/85fb106488e3cb8c02e397c917222ad4.png
(from http://en.wikipedia.org/wiki/Accuracy_and_precision)

You can see the correctly classified instances reported in the summary part (a little bit above the part it's reporting the accuracy by class). in front of this part you can see a number (which indicates the number of instances) and a percentage (which is the accuracy).

Related

Google Cloud Platform - Vertex AI - is there a way to look at a chart of training performance over time?

I'd like to know how the training performance changes over the course of the training. Is there any way to access that via Vertex AI automl service?
Unfortunately it is not possible to see the training performance over the course of the training. Vertex AI Auto ML only shows if the training job is running or not.
The only available information is "how well did model performed with the test set after training". This can seen in the "Evaluation" tab in AutoML. You can refer to Vertex AI Auto ML Evaluation for further readings.
AutoML provides evaluation metrics that could help you determine the performance of your model. Some of the evaluation metrics are precision, recall, and confidence thresholds. These vary depending on what AutoML product you are using.
For example if you have an image classification model, the following are the available evaluation metrics:
AuPRC: The area under the precision-recall (PR) curve, also referred to as average precision. This value ranges from zero to one, where a
higher value indicates a higher-quality model.
Log loss: The cross-entropy between the model predictions and the target values. This ranges from zero to infinity, where a lower value
indicates a higher-quality model.
Confidence threshold: A confidence score that determines which predictions to return. A model returns predictions that are at this
value or higher. A higher confidence threshold increases precision but
lowers recall. Vertex AI returns confidence metrics at different
threshold values to show how the threshold affects precision and
recall.
Recall: The fraction of predictions with this class that the model correctly predicted. Also called true positive rate. Precision: The
fraction of classification predictions produced by the model that were
correct.
Confusion matrix: A confusion matrix shows how often a model correctly predicted a result. For incorrectly predicted results, the
matrix shows what the model predicted instead. The confusion matrix
helps you understand where your model is "confusing" two results.

How to eliminate outliers while calculating standard deviation in a measure in PowerBI measure

I am relatively new to PowerBI, I want to calculate the Standard Deviation of a parameter eg: Temperature for each batch based on different filters.
This standard deviation has to be calculated by a measure
and it has to be displayed in a Card
But I need to eliminate outliers before calculating std deviation.
How can I Do it in measure?
Outliers condition is Values greater than 99 percentile and less than 1 percentile to be considered as outliers
If You need to work with percentile then use function in your logic
https://www.youtube.com/watch?v=5AxtNdJ5wqk
https://dax.guide/percentile-exc/
https://dax.guide/percentile-inc/
Usefull statistical patterns:
https://www.daxpatterns.com/statistical-patterns/

Using class weight to balance data set lowers accuracy in RBF SVM

I have been using sklearn to learn on some data. This is a binary classifcation task and I am using a RBF kernel. My data set is quite unbalanced (80:20) and I'm using only 120 samples, with 10ish features (I've been experimenting with a few less). Since I set class_weight="auto" the accuracy I've calculated from a cross validated (10 folds) gridsearch has dropped dramatically. Why??
I will include a couple of validation accuracy heatmaps to demonstrate the difference.
NOTE: top heatmap is before classweight was changed to auto.
Accuracy is not the best metrics to use when dealing with unbalanced dataset. Let's say you have 99 positive examples and 1 negative example, and if you predict all outputs to be positive, still you will get 99% accuracy, whereas you have mis-classified the only negative example. You might have gotten high accuracy in the first case because your predictions will be on the side which has high number of samples.
When you do class weight = auto, it takes the imbalance into consideration and hence, your predictions might have moved towards center, you can cross-check it using plotting the histograms of predictions.
My suggestion is, don't use accuracy as performance metric, use something like F1 Score or AUC.

Frequency & amplitude

I have a data (file) which contains 2 columns:
Seconds, Volts
0, -0.4238353
2.476346E-08, -0.001119718
4.952693E-08, -0.006520569
(..., thousands of similar entries in file)
4.516856E-05, -0.0002089292
How to calculate the frequency of the highest amplitude wave ? (Each wave is of fixed frequency).
Is there any difference between calculating frequency of seconds and amplitude vs. seconds and volts? Because in Frequency & amplitue there is seconds and amplitude example solved, so it might help in my case.
Your data is in the time domain, the question is about the frequency domain. Your course should have told you how the two are related. In two words: Fourier Transform. In practical programming, we use the FFT: Fast Fourier Tranform. If the input is a fixed frequency sine wave, your FFT output will have one hump. Model that as a parabola and find the peak of the parabola. (Finding the highest amplitude in the FFT is about 10 times less accurate)
The link you give is horrible; I've downvoted the nonsense answer there. In your example, time starts at t=0 and the solution given would do a 1/0.

Understanding cost-sensitive evaluation in Weka (cost matrix)

I am using Weka 3.7.1
I am attempting to analyze sport predictions for baseball using weka. I would like to use a cost matrix because the cost of different outcomes is not the same at a sportsbook where I gamble on the game. My data set is simple: it is a set of predictions with a nominal class {WIN,LOSS}. For this question, the attributes are not a concern.
In the WEKA Explorer, after loading my arff file I can setup a cost matrix from
Classify->More Options...->Cost-sensitive evaluation->Set...->There is
a 2x2 grid that appears in the weka cost-sensitive evaluation after I
set the classes == 2
Here are the values I would like to enter in to the cost matrix:
Correctly classified as loss, cost is 0 (I did not wager)
Incorrectly classified as loss, cost is 0 (I did not wager)
Correctly classified as win, cost is -.909 (I won .909 dollars)
Incorrectly classified as win, cost is 1.0 (I lost a dollar)
Observe that to stay true with it being a 'cost matrix' that I set my profit to a negative value (which is the opposite of cost, it is a profit); and that I set the loss to a positive number (because it cost me when I lost the wager).
After some reflection I decided to use the following grid, and I have not a clue if I did this correctly, please let me know if I did this correctly:
- a b <---- "classified as"
- 0 1.0 a=LOSS
- 0 -.909 b=WIN
And here is my probably faulty logic: (col, row)
(0,0) of grid=0: classified as LOSS, and was LOSS
(0,1) of grid=0: classified as LOSS, but was WIN
(1,0) of grid=1.0; classified as WIN, but was LOSS
(1,1) of grid=.909; classified as WIN, was WIN
and of course (0,0) and (0,1) represent the classifier predicting a LOSS and in these cases I do not wager, and therefore there is no cost.
on the other hand (1,0) and (1,1) represent the classifier predicting a WIN and in these cases I place a wager, and therefore there is a cost associated.
One other item that is of great confusion: after I setup the cost matrix and execute a classifier, the output report contains the following:
Evaluation cost matrix:
0 1
0 0.91 <--- notice that this is not a negative value!
And as you can see, in the report (1,1) is 0.91 when I had actually entered -.909. I did find another post about this topic, but it does not explain why the negative value became positive.
Thank you in advance. Please note that these are answerable questions; however, if you want to provide some guidance I would be very happy as I am a newbie still trying to build a framework of understanding.
Cost matrix is a way to change the threshold value for decision boundary.
It is explained in a following paper.
http://research.ijcaonline.org/volume44/number13/pxc3878677.pdf
By looking at your cost matrix it seems that there is a little correction required.
e.g.
0 cost
cost 0
just for explanation:
consider following cost matrix:
a b
c d
This is the general format of cost matrix which I have observed for two class problems.
now when you have classified something at a or d location then there is no need to incorporate the cost.
So the point here is, the cost comes in picture only when there is a misclassification. i.e. either at b or c location.
But as you have written negative value as a cost at place d it creates confusion. (kindly make it possible to explain the same, i.e. what do you mean by negative cost.)
an example cost matrix can be:
0 1
10 0
which says that cost of classifying examples as false positive is 10 times higher than the cost of misclassification of similar example as false negative. Moreover there is no cost when examples are classified correctly.