AttributeSelectedClassifier - How to deal with error "A nominal attribute (likes) cannot have duplicate labels ('(0.045455-0.045455]')" - weka

I am using KNIME in order to activate a WEKA node AttributeSelectedClassifier .
But i keep getting this exception claiming that my attribute is nominal and has duplicate values.
But, it is numeric and it is very expected to have duplicate values in the dataset!
AttributeSelectedClassifier - How to deal with error "A nominal attribute (likes) cannot have duplicate labels ('(0.045455-0.045455]')"
I found similar topics to this one but none of them is covering how to chose the scalar to scale values with
1st Question: so i will be happy if someone can explain why is this behavior? I mean why duplicate values is bad?!
Anyway,
One of the threads of a similar topic recommended to scale the values by a large enough number (a scalar)!
Based on that I multiplied values with 10^6 and got error about this value: 27027.027027-27027.027027
I multiplied by 10^7 and then got an error about this value: 270270.27027-270270.27027
when i multiplied by 10^8 it succeeded.
2nd Question: what is the right way to deal with this? and how can i, programatically, chose the scalar to scale with ?
The full error:
ERROR AttributeSelectedClassifier - Execute failed: IllegalArgumentException in Weka during training. Please verify your settings. A nominal attribute (Meanlikes) cannot have duplicate labels ('(0.045455-0.045455]').

Related

strange DataFormat.Error: We couldn't convert to Number. Details: nan

DataFormat.Error: We couldn't convert to Number
Details: nan
I keep getting above error and I just can't get it solved.
The same error message appears both when:
I try to perform Table.ExpandTableColumn
try to filter only rows with errors
same error whether I specify column(s) in table.selectrowswitherrors or not
I don't expect this table to contain errors, however that case it should just return empty table (and it indeed does for other tables)
I don't have any division in my data model, so it's really strange how nan could distributed (it's the result of 0/0 in Power Query)
update
It seems I've some corrupted rows in my source data, by filtering down my table, there is a row with "Error" at the bottom:
Unfortunately I can't see it's details as clicking on one of the "Error"s gives error message:
Also when I try to remove errors, that row is still not removed:
The source data is in Excel (200k+ rows), I removed all empty rows below the used range in case there would be an extra row used there which cause the issue, but it didn't help.
Finally I could solve the problem by adding RemoveRowsWithErrors much earlier in the code, when the error was present only in one column and not propagated to the whole row.
As its suggested also here: https://app.powerbi.com/groups/me/apps/3605fd5a-4c2e-46aa-bee9-1e413fc6028a/reports/dd7a5d70-dca1-44c5-a8f4-7af5961fe429/ReportSection

Weka not display Correctly classified instances as output

I am new on weka. I have a dataset in csv with 5000 samples. here 20 samples of it; when I upload this dataset into weka, it looks ok, but when I run knn algorithm it gives a result that is not supposed to give. here is the sample data.
a,b,c,d
74,85,123,1
73,84,122,1
72,83,121,1
70,81,119,1
70,81,119,1
69,80,118,1
70,81,119,1
70,81,119,1
76,87,125,1
76,87,125,1
82,92,146,2
74,86,140,2
68,80,134,2
64,76,130,2
64,75,132,2
83,96,152,2
72,85,141,2
71,83,141,2
69,81,139,2
65,79,137,2
here is the result :
=== Cross-validation ===
=== Summary ===
Correlation coefficient 0.6148
Mean absolute error 0.2442
Root mean squared error 0.4004
Relative absolute error 50.2313 %
Root relative squared error 81.2078 %
Total Number of Instances 5000
it is supposed to give this kind of result like:
Correctly classified instances: 69 92%
Incorrectly classified instances: 6 8%
What should be the problem? What am I missing? I did this in all other algorithms but they all give the same output. I have used sample weka datasets, they all work as expected.
The IBk algorithm can be used for regression (predicting the value of a numeric response for each instance) as well as for classification (predicting which class each instance belongs to).
It looks like all the values of the class attribute in your dataset (column d in your CSV) are numbers. When you load this data into Weka, Weka therefore guesses that this attribute should be treated as a numeric one, not a nominal one. You can tell this has happened because the histogram in the Preprocess tab looks something like this:
instead of like this (coloured by class):
The result you're seeing when you run IBk is the result of a regression fit (predicting a numeric value of column d for each instance) instead of a classification (selecting the most likely nominal value of column d for each instance).
To get the result you want, you need to tell Weka to treat this attribute as nominal. When you load the csv file in the Preprocess tab, check Invoke options dialog in the file dialog window. Then when you click Open, you'll get this window:
The field nominalAttributes is where you can give Weka a list of which attributes are nominal ones even if they look numeric. Entering 4 here will specify that the fourth attribute (column) in the input is a nominal attribute. Now IBk should behave as you expect.
You could also do this by applying the NumericToNominal unsupervised attribute filter to the already loaded data, again specifying attribute 4 otherwise the filter will apply to all the attributes.
The ARFF format used for the Weka sample datasets includes a specification of which attributes are which type. After you've imported (or filtered) your dataset as above, you can save it as ARFF and you'll then be able to reload it without having to go through the same process.

methods does not give confusion matrix in weka

I want to do classification in weka. I am using some methods(Random Tree, Random Forest, Decision Table, RandomSubspace...) but they give results like below.
=== Cross-validation ===
=== Summary ===
Correlation coefficient 0.1678
Mean absolute error 0.4832
Root mean squared error 0.4931
Relative absolute error 96.6501 %
Root relative squared error 98.6323 %
Total Number of Instances 100000
However I want results as accurancy and confusion matrix. How can I get results like that?
Note: When I use small dataset, it gives results as confusion matrix. Can it be related with the size of dataset?
The output of the training/testing in Weka depends on the type of the attribute that you are trying to predict. If your attribute is nominal, you will get a confusion matrix and accuracy value. If your attribute is numeric, you will get a correlation coefficient.
In your small and large datasets that you mention, what is your type of the attribute that you are predicting?
I have run a 2-class problem using J48 and RandomForest with 100000 instances and the confusion matrix appeared correctly. I additionally increased the problem complexity to run 20 different classes and the confusion matrix appeared correctly as well.
If you look under more options, please ensure that the 'output confusion matrix' is checked and see if this resolves the issue.

Multinomial Naive Bayes raises error

1) Applying MultinomialNaivesBayes(not by any other classifier) in weka raises exception "problem evaluating classifier: Numeric attribute values must all be greater or equal to zero" ? How to fix it ?
2) Is dimensionality reduction (PCA, LSI, Random projection) is an alternative for feature selection (InformationGain, ChiSqr) or we need to apply both ? I have seen conflicting opinions about them on internet ?

informatica value larger than specified precision allowed for this column

I tried to load a table ADuplicate which is duplicate of Table A using one to one mapping direct mapping in Informatica.
But I got following error:
"Value larger than specified precision allowed for this column"
I noticed that for C4 column, which is number(15) in both tables, has the problem while loading.
Data which has error in loading are 200000300123 and -1000000000000000000000000000000000000000000
My doubt is:
This value is available in Source of same precision. Why doesn't it get into target?
I changed the Target Column C4 as just Number field I could insert this value manually using TOAD but why couldn't I do the same using Informatica?
Please help me out.
Thanks in advance
Shanmugam
Do you have some transformation between source and target that sets a different precision for this port? Especially the one before the target?
The data written to the target has higher precision - possibly set higher in some transformation(s) in the middle. You may test with an expression transformation in the middle to reduce the precision.
Try checking "Enable high precision" which is available in "properties" tab in session properties.