I had an excel sheet i converted it to ARFF file using online facility, but when i tried to open it in the 3.8 version of WEKA software it is showing me this error--
I have attached the image of the dialog box which is popping. Please help me out.
Thanks in advance
You can just open the CSV file with Weka explorer and save it as arff you really need it in that format.
EDIT:
Your problem is not the file format. It seems that all of your values are numeric. J48 is a classification algorithm and therefore Weka won't let you use it on numeric data.
Which column in the data is the target?
IF you want to use classification algorithm, you need to do one of the following: use a numeric-to-nominal filter on the target feature, use an arff file where you specify that the target column is nominal or rename the values of the column to non-numeric values. Here is a link to an arff file where the last column (race) is defined as nominal: https://drive.google.com/open?id=0B7b0iysQV1SEcjJJUE1lc19fR2c
Related
Suppose I have a CSV file, where the first column of data is a date with format yyyy-MM-dd HH:mm:ss, while the second column is a date with format yyyy-MM-dd HH:mm. How can I import the CSV file into the Weka Explorer such that both attributes have the "date" type?
I understand that in the Weka Explorer's "Preprocess" tab's "Open file ..." dialog, I can select "Invoke options dialog" to customize the data types of the imported attributes:
However, the resulting configuration window only allows me to specify one dateFormat:
How can I solve the problem? Do I have to manually convert the CSV file into an ARFF file by editing the CSV file in a text editor?
The ADAMS framework has the weka.filters.unsupervised.attribute.StringToDate filter that allows the conversion of a range of attributes using a specific date format string. You can wrap multiple of these filters in a weka.filters.MultiFilter to convert all of your various date columns in one go. The MultiFilter would get applied to the initially loaded CSV file, where all the date columns are just string columns.
ADAMS also offers the Weka Investigator, a more powerful tool than the Weka Explorer: multi-session support, multiple files loaded at the same time (you can load train and test files and just select the files from a combobox then), multiple tabs of the same kind (not just a single Classify tab), different types of cross-validation (batched/non-batched), predefined output generators that get applied to evaluations (model, statistics, classifier errors, ...), etc.
Also, for automating all these steps, you can use ADAMS' Flow editor, a workflow engine.
If you don't want to use ADAMS: load the CSV file in the Weka Explorer, save it as ARFF then manually edit the relevant attribute types (see ARFF format) before reloading it in the Weka Explorer.
I have a pandas dataframe that looks like this:
A B C
1 2 =A2+B2
3 4 =A3+B3
I write this to an Excel file using xlsxwriter in Python and convert the data frame to Excel. Now, when I read the Excel from Python, I get 0.0 as the value for C2 and not 3 (=A2+B2). However, if I open Excel manually, the formulas are evaluated and has '3' in 'C2'. So the problem occurs while reading from code.
Is there a way in Python to read Excel columns with formulas as values?
So the problem is while reading from code.
Not really.
The issue is that XlxsWriter doesn't write the value of a formula to an Excel file. From the XlsxWriter FAQ:
Formula results displaying as zero in non-Excel applications
Due to wide range of possible formulas and interdependencies between them XlsxWriter doesn’t, and realistically cannot, calculate the result of a formula when it is written to an XLSX file. Instead, it stores the value 0 as the formula result. It then sets a global flag in the XLSX file to say that all formulas and functions should be recalculated when the file is opened.
This is the method recommended in the Excel documentation and in general it works fine with spreadsheet applications. However, applications that don’t have a facility to calculate formulas, such as Excel Viewer, or several mobile applications, will only display the 0 results.
If required, it is also possible to specify the calculated result of the formula using the optional value parameter in write_formula():
worksheet.write_formula('A1', '=2+2', num_format, 4)
I would save the Excel file as a .csv. Excel should automatically convert all formulae to values. You can then read the .csv into Python with the usual file methods.
Answers to this question explain how to output classification predictions to CSV in Weka in both Weka 3.6 (right / option-click model and then save predictions) and 3.7 (choose more options and select Output predictions).
In Weka 3.7, I chose more options, selected Output predictions, and chose CSV as the specific type of output. An answer suggests to "Click on 'outputFile' and select a folder and type a filename." However, I cannot see 'outputFile' or where the CSV output is saved.
Where is the output file saved, or how can I click on 'outputFile' to name the output?
In Weka 3.7.12 on OSX, I was able to find 'outputFile' and the other options by clicking on the whitebox containing CSV (after choosing CSV first), much like how you specify the options for certain classifiers by clicking on those whiteboxes. For me I wasn't able to type a filename, but if I created a blank file manually, I was able to correctly save the output the predictions to that file.
If I left click on CSV (once I've selected it first), WEKA allows me to select an existing CSV file to save the predictions
I am performing a PCA operation on my dataset using WEKA (filter-unsupervised-principal component). Once I apply, I am getting the PCA. However I am not able to export the PCA in a separate file for further processing. How do I export first 3 PCA in a csv or a txt file from Weka?
The "Save..." button at the top right of the Preprocess tab in Weka Explorer will export your PCA-filtered data. You will be prompted for the name and type of file you'd like to export to.
You can control the number of allowed principal components via the -M parameter to the filter, or you could export to a .csv file, open in a spreadsheet application, and remove all but the first three columns.
I was once able to print an Excel equation directly to a .CSV file (which opens in Excel), which Excel would do what it would usually do when there's an equation in one of its column
Ex: (fprintf(fp,"\"=COUNTIF(R%d:AG%d,\"\">0\"\")*1.25\",",x,x);
I often print my metrics as comma separated values, since I don't know how to write directly to a new Excel file. And if there is an equation, say like the one above, Excel will compute the equation and print the values correctly.
I was wondering if there is equations similar to plot between values in two columns, so that when I open the Excel file, the plot opens too?
Has anybody done anything like this? I wish I had some Excel libraries or DLL's or something and use some Excel API's which allows me to format columns, plot graphs, etc. etc. Is something like that available out there? [Did some google-fu, didn't find anything useful. I know there's a Excel Perl module available to format outputs, save as Excel file etc.]
Thanks for help.