Why isn't any algorithm working in WEKA when my dataset is fully loaded? - weka

I am trying to analyze a dataset in WEKA with a nominal class. However, all the other attributes have both numeric and nominal values but the final class has nominal values. All algorithm options except very few are showing up? Can you please tell me why this is happening?

Related

VertexAI Tabular AutoML rejecting rows containing nulls

I am trying to build a binary classifier based on a tabular dataset that is rather sparse, but training is failing with the following message:
Training pipeline failed with error message: Too few input rows passed validation. Of 1169548 inputs, 194 were valid. At least 50% of rows must pass validation.
My understanding was that tabular AutoML should be able to handle Null values, so I'm not sure what's happening here, and I would appreciate any suggestions. The documentation explicitly mentions reviewing each column's nullability, but I don't see any way to set or check a column's nullability on the dataset tab (perhaps the documentation is out of date?). Additionally, the documentation explicitly mentions that missing values are treated as null, which is how I've set up my CSV. The documentation for numeric however does not explicitly list support for missing values, just NaN and inf.
The dataset is 1 million rows, 34 columns, and only 189 rows are null-free. My most sparse column has data in 5,000 unique rows, with the next rarest having data in 72k and 274k rows respectively. Columns are a mix of categorical and numeric, with only a handful of columns without nulls.
The data is stored as a CSV, and the Dataset import seems to run without issue. Generate statistics ran on the dataset, but for some reason the missing % column failed to populate. What might be the best way to address this? I'm not sure if this is a case where I need to change my null representation in the CSV, change some dataset/training setting, or if its an AutoML bug (less likely). Thanks!
To allow invalid & null values during training & prediction, we have to explicitly set the allow invalid values flag to Yes during training as shown in the image below. You can find this setting under model training settings on the dataset page. The flag has to be set on a column by column basis.
I tried #Kabilan Mohanraj's suggestion and it resolved my issue. What I had to do was click the dropdown to allow invalid values into training. After making this change, all rows passed validation and my model was able to train without issue. I'd initially assumed that missing values would not count as invalid, which was incorrect.

How to build tree model in SAS by specifying groups for cross validation

I'd like to build a tree model with cross-validation. I'd like to assign observations to different cross-validation groups by myself instead of random sampling. I am not sure how SAS can do this. It would be helpful if anybody can share some example SAS code.
Many thanks in advance

how fix error train and test set are not compatible?

0
I have a dataset of about 7000 records. After clearing, I performed normalization and discretization operations on it.Then I applied a j48 model to it and saved it to my computer.Now I want to test this model on a dataset of 500 records. All columns in this dataset are the same as the original dataset. However, the "class" column in the test dataset has no value. But I got an error. For this reason, I also applied normalization and discretization operations to the test dataset. But I still get this error. Note that I specified the class attribute in both datasets, but again this error was displayed.
I have a dataset of about 7000 records. After clearing, I performed normalization and discretization operations on it.Then I applied a j48 model to it and saved it to my computer.Now I want to test this model on a dataset of 500 records. All columns in this dataset are the same as the original dataset. However, the "class" column in the test dataset has no value. But I got an error. For this reason, I also applied normalization and discretization operations to the test dataset. But I still get this error. Note that I specified the class attribute in both datasets, but again this error was displayed.
this is a screenshot of my test file:test.arff screenshot
and this is a screenshot of my train dataset file: enter image description here
and these are screenshots of errors : enter image description here
Thanks for the screenshots. The attribute "code" does not have the same values in the training and test set.
It looks like that is a case identifier, so you wouldn't expect the values to be the same. So, instead of having this as a nominal attribute, treat it as a numeric attribute.
#attribute code numeric
Let me know if this fixes the problem.

Analyzing quantitative data in WEKA

Hi My dataset contains only quantitative data(numerical). It doesn't have any class attributes. The dataset contains with sales of different years. I need to analyze the data in different ways. Can I use WEKA for this analysis? I tried to use WEKA tool. But it seemed I cannot proceed with WKA unless I have class variables for the dataset. Please kindly give me a hint.

Converting mixed data set to numerical data set

In my project i have to work with mixed dataset (i.e.it has both categorical and numerical data). Is there any algorithm or method for converting categorical values to numerical values so finally my dataset should contain only numerical values. Can anyone please help me out....
(Im doing my project in matlab)
Use one-hot encoding.
But don't expect the results to be very good. There is a lot of meaning lost this way.