How do I import a CSV file that has columns with different date formats into Weka? - weka

Suppose I have a CSV file, where the first column of data is a date with format yyyy-MM-dd HH:mm:ss, while the second column is a date with format yyyy-MM-dd HH:mm. How can I import the CSV file into the Weka Explorer such that both attributes have the "date" type?
I understand that in the Weka Explorer's "Preprocess" tab's "Open file ..." dialog, I can select "Invoke options dialog" to customize the data types of the imported attributes:
However, the resulting configuration window only allows me to specify one dateFormat:
How can I solve the problem? Do I have to manually convert the CSV file into an ARFF file by editing the CSV file in a text editor?

The ADAMS framework has the weka.filters.unsupervised.attribute.StringToDate filter that allows the conversion of a range of attributes using a specific date format string. You can wrap multiple of these filters in a weka.filters.MultiFilter to convert all of your various date columns in one go. The MultiFilter would get applied to the initially loaded CSV file, where all the date columns are just string columns.
ADAMS also offers the Weka Investigator, a more powerful tool than the Weka Explorer: multi-session support, multiple files loaded at the same time (you can load train and test files and just select the files from a combobox then), multiple tabs of the same kind (not just a single Classify tab), different types of cross-validation (batched/non-batched), predefined output generators that get applied to evaluations (model, statistics, classifier errors, ...), etc.
Also, for automating all these steps, you can use ADAMS' Flow editor, a workflow engine.
If you don't want to use ADAMS: load the CSV file in the Weka Explorer, save it as ARFF then manually edit the relevant attribute types (see ARFF format) before reloading it in the Weka Explorer.

Related

How Can I Stop Power Query From Cutting of Data From The Left Side Of PDF Files It Imports

I'm trying to use PowerQuery to extract data from PDF files. When I load the data, Power Query cuts off up to 10-12 characters on the left side of the page. For example:
Actual Data in PDF file column 1: MALE 2YO 5NOV MS2
What PowerQuery Imports: 5NOV MS2
Data Cut off from Column 1: MALE 2YO
It's almost as if Power Query is importing the data under a page specification that is not consistent if the page specification of the PDF file. Is there a way to adjust this so that I can capture all the data on the PDF document?

Error While opening ARFF file in WEKA

I had an excel sheet i converted it to ARFF file using online facility, but when i tried to open it in the 3.8 version of WEKA software it is showing me this error--
I have attached the image of the dialog box which is popping. Please help me out.
Thanks in advance
You can just open the CSV file with Weka explorer and save it as arff you really need it in that format.
EDIT:
Your problem is not the file format. It seems that all of your values are numeric. J48 is a classification algorithm and therefore Weka won't let you use it on numeric data.
Which column in the data is the target?
IF you want to use classification algorithm, you need to do one of the following: use a numeric-to-nominal filter on the target feature, use an arff file where you specify that the target column is nominal or rename the values of the column to non-numeric values. Here is a link to an arff file where the last column (race) is defined as nominal: https://drive.google.com/open?id=0B7b0iysQV1SEcjJJUE1lc19fR2c

Output classification predictions to CSV in Weka--where is output file saved?

Answers to this question explain how to output classification predictions to CSV in Weka in both Weka 3.6 (right / option-click model and then save predictions) and 3.7 (choose more options and select Output predictions).
In Weka 3.7, I chose more options, selected Output predictions, and chose CSV as the specific type of output. An answer suggests to "Click on 'outputFile' and select a folder and type a filename." However, I cannot see 'outputFile' or where the CSV output is saved.
Where is the output file saved, or how can I click on 'outputFile' to name the output?
In Weka 3.7.12 on OSX, I was able to find 'outputFile' and the other options by clicking on the whitebox containing CSV (after choosing CSV first), much like how you specify the options for certain classifiers by clicking on those whiteboxes. For me I wasn't able to type a filename, but if I created a blank file manually, I was able to correctly save the output the predictions to that file.
If I left click on CSV (once I've selected it first), WEKA allows me to select an existing CSV file to save the predictions

How to export PCA from Weka

I am performing a PCA operation on my dataset using WEKA (filter-unsupervised-principal component). Once I apply, I am getting the PCA. However I am not able to export the PCA in a separate file for further processing. How do I export first 3 PCA in a csv or a txt file from Weka?
The "Save..." button at the top right of the Preprocess tab in Weka Explorer will export your PCA-filtered data. You will be prompted for the name and type of file you'd like to export to.
You can control the number of allowed principal components via the -M parameter to the filter, or you could export to a .csv file, open in a spreadsheet application, and remove all but the first three columns.

MFC - Format rtf document to two columns

I am merging many rtf files into a single file for printing. In order to save paper, I would like to have the printout of the merged rtf document in two columns per page.
What is the best way to do this?
I found out a way of doing this.
Load the rtf document into a CRichEditCtrl.
Use the CRichEditCtrl's FormatRange method to format and render the text to different part of the paper; left column and right column in this case.