How to measure precision and recall rules generated by Apriori using Weka ?
Related
I am trying to analyze a dataset in WEKA with a nominal class. However, all the other attributes have both numeric and nominal values but the final class has nominal values. All algorithm options except very few are showing up? Can you please tell me why this is happening?
I have a Data Set of online product-reviews (without any grades/stars/etc.). To this data-set I applied the integrated PowerBI AI-Insights Text Analytics Sentiment Analysis model and got a a sentiment score for each review. Next, I transformed the score into textual discrete values: POSITIVE, NEGATIV and NEUTRAL.
The dataset is artificially created by me, so I know the polarity of each comment. Now I want to compare the predicted value to the actual value. I've done this by adding a new column that compares the actual value with the predicted value and displays "PREDICTED" if the correct value was predicted and "NOT PREDICTED" if the prediction was false (it doesn't matter if it is positive, negative or neutral). My goal is to calculate some model metrics so I can evaluate the capabilities of this PowerBI integrated model and to visualize the results. How can I do this? Is "accuracy" the first thing that I have to start with? If yes then how can I calculate and visualize a result like the "accuracy".
Thank you for all your answers in advance.
Yes, take accuracy in first consideration. If you find 70 or 80 percent above results are accurate, you can easily rely on the PowerBI AI-Insights Text Analytics Sentiment Analysis. You can then create your visuals for Sentiment data. But if there is 50-50 occurrence of predicted and not predicted result, you may go for 3rd party Sentiment analysis service like - Google, Alchemy.
I am trying to calculate the standard deviation from a hospital Average Daily Census report. The report has by floor and by unit. The raw data is midnight census events for each patient...hundreds every day. I also have a filter on the report for different clinical services so the standard deviation needs to calculate "on the fly" as I change the filter.
The first picture below shows the results unfiltered. The second shows the results with some services selected.
I have found one way to calculate deviation but it has to be from a specific field. Since my ADC itself is calculated, this does not work.
I also saw how you can create a table (DAX?) but have not been able to get that to work and not sure it can be dynamic and calculate after filtering.
Is what I am trying to do even possible in Power BI?
Thanks
It sounds like you want the standard deviation of ADC over time at a daily granularity.
If this is correct, the basic approach is to calculate the measure for each day and then take the standard deviation on that set. In DAX, this will look something like this:
StdDevADC =
STDEVX.S (
SUMMARIZECOLUMNS ( DateTable[Date], "ADCThisDate", [ADC] ),
[ADCThisDate]
)
Even if this isn't exactly what you need, it should give you an idea of how to approach this. You need to calculate [ADC] for each element of the dimension you want to take the standard deviation over and then use the iterator version of the Standard Deviation function to calculate over that table/list you just calculated.
I am using Weka for Data mining a dataset. I can find median, stdev using explorer but not range, quartiles, variance and mode. Is there any configuration required in the tool for the same or it just can't possible with the tool?
You can use a Filter, the Unsupervised Attribute Filter "AddExpression" or the "MathExpression", to calculate something for a single attribute.
Obviously, this is primitive, and you cannot do this for each attribute in one fell swoop.
I am doing a logistic regression of a binary dependent variable on a four-value multinomial (categorical) independent variable. Somebody suggested to me that it was better to put the independent variable in as multinomial rather than as three binary variables, even though SAS seems to treat the multinomial as if it is three binaries. THeir reason was that, if given a multinomial, SAS would report std errors and confidence intervals for the three binary variables 'relative to the omitted variable', whereas if given three binaries it would report them 'relative to all cases where the variable was zero'.
When I do the regression both ways and compare, I see that nearly all results are the same, including fit statistics, Odds Ratio estimates and confidence intervals for odds ratios. But the coefficient estimates and conf intervals for those differ between the two.
From my reading of the underlying theory,as presented in Hosmer and Lemeshow's 'Applied Logistic Regression', the estimates and conf intervals reported by SAS for the coefficients are consistent with the theory for the regression using three binary independent variables, but not for the one using a 4-value multinomial.
I think the difference may have something to do with SAS's choice of 'design variables', as for the binary regression the values are 0 and 1, whereas for the multinomial they are -1 and 1. But I don't really understand what SAS is doing there.
Does anybody know how SAS's approach differs between the two regressions, and/or can explain the differences in the outputs?
Here is a link to the SAS output:
SAS output
And here is the SAS code:
proc logistic data=tab descending;
class binB binC binD / descending;
model y = binD binC binB ;
run;
proc logistic data=tab descending;
class multi / descending;
model y = multi;
run;