I have a dataset of about 7000 records. After clearing, I performed normalization and discretization operations on it.Then I applied a j48 model to it and saved it to my computer.Now I want to test this model on a dataset of 500 records. All columns in this dataset are the same as the original dataset. However, the "class" column in the test dataset has no value. But I got an error. For this reason, I also applied normalization and discretization operations to the test dataset. But I still get this error. Note that I specified the class attribute in both datasets, but again this error was displayed.
this is a screenshot of my test file:
test arff file
and this is a screenshot of my train dataset file:
train arff file
can anybody help me?
Related
I am trying to analyze a dataset in WEKA with a nominal class. However, all the other attributes have both numeric and nominal values but the final class has nominal values. All algorithm options except very few are showing up? Can you please tell me why this is happening?
I am trying to build a binary classifier based on a tabular dataset that is rather sparse, but training is failing with the following message:
Training pipeline failed with error message: Too few input rows passed validation. Of 1169548 inputs, 194 were valid. At least 50% of rows must pass validation.
My understanding was that tabular AutoML should be able to handle Null values, so I'm not sure what's happening here, and I would appreciate any suggestions. The documentation explicitly mentions reviewing each column's nullability, but I don't see any way to set or check a column's nullability on the dataset tab (perhaps the documentation is out of date?). Additionally, the documentation explicitly mentions that missing values are treated as null, which is how I've set up my CSV. The documentation for numeric however does not explicitly list support for missing values, just NaN and inf.
The dataset is 1 million rows, 34 columns, and only 189 rows are null-free. My most sparse column has data in 5,000 unique rows, with the next rarest having data in 72k and 274k rows respectively. Columns are a mix of categorical and numeric, with only a handful of columns without nulls.
The data is stored as a CSV, and the Dataset import seems to run without issue. Generate statistics ran on the dataset, but for some reason the missing % column failed to populate. What might be the best way to address this? I'm not sure if this is a case where I need to change my null representation in the CSV, change some dataset/training setting, or if its an AutoML bug (less likely). Thanks!
To allow invalid & null values during training & prediction, we have to explicitly set the allow invalid values flag to Yes during training as shown in the image below. You can find this setting under model training settings on the dataset page. The flag has to be set on a column by column basis.
I tried #Kabilan Mohanraj's suggestion and it resolved my issue. What I had to do was click the dropdown to allow invalid values into training. After making this change, all rows passed validation and my model was able to train without issue. I'd initially assumed that missing values would not count as invalid, which was incorrect.
0
I have a dataset of about 7000 records. After clearing, I performed normalization and discretization operations on it.Then I applied a j48 model to it and saved it to my computer.Now I want to test this model on a dataset of 500 records. All columns in this dataset are the same as the original dataset. However, the "class" column in the test dataset has no value. But I got an error. For this reason, I also applied normalization and discretization operations to the test dataset. But I still get this error. Note that I specified the class attribute in both datasets, but again this error was displayed.
I have a dataset of about 7000 records. After clearing, I performed normalization and discretization operations on it.Then I applied a j48 model to it and saved it to my computer.Now I want to test this model on a dataset of 500 records. All columns in this dataset are the same as the original dataset. However, the "class" column in the test dataset has no value. But I got an error. For this reason, I also applied normalization and discretization operations to the test dataset. But I still get this error. Note that I specified the class attribute in both datasets, but again this error was displayed.
this is a screenshot of my test file:test.arff screenshot
and this is a screenshot of my train dataset file: enter image description here
and these are screenshots of errors : enter image description here
Thanks for the screenshots. The attribute "code" does not have the same values in the training and test set.
It looks like that is a case identifier, so you wouldn't expect the values to be the same. So, instead of having this as a nominal attribute, treat it as a numeric attribute.
#attribute code numeric
Let me know if this fixes the problem.
I am performing a PCA operation on my dataset using WEKA (filter-unsupervised-principal component). Once I apply, I am getting the PCA. However I am not able to export the PCA in a separate file for further processing. How do I export first 3 PCA in a csv or a txt file from Weka?
The "Save..." button at the top right of the Preprocess tab in Weka Explorer will export your PCA-filtered data. You will be prompted for the name and type of file you'd like to export to.
You can control the number of allowed principal components via the -M parameter to the filter, or you could export to a .csv file, open in a spreadsheet application, and remove all but the first three columns.
I am trying to select the best attributes for my training data set which contains numeric values/attributes. which attribute evaluator/method would yield the best results for about 10 or so attributes? Training dataset is about 1400 lines of population statistics data.