How to open a tab-delimited file in Weka - weka

When I try to open a tab-delimited file in Weka it says: "file format is not recognized". In the subsequent dialog box it shows weka.core.converters.CSVLoader and says "Reads a source that is in comma separated or tab separate format." When I click the OK button, it throws an error saying "wrong number of values. Read 11, expected 10 line 4." I verified the same file in Excel that the line had 10 fields.
Could someone advise a workaround?
The data file cannot be converted to CSV format because some of the fields contain a comma.

When installing the unofficial Weka package common-csv-weka-package, you can load tab-delimited CSV files using the CommonCSVLoader loader. Simply change the loader's format from DEFAULT to TDF (-F command-line option).

I had same problem. So far the best solution I found is using R to convert a tabular data file into arff. Google two keywords "import data to R" and "export R data to weka arff". My second choice is using JMP or SAS to open a csv or Excel workbook and then export as CSV.

I found a solution: for Windows 10, install the R language package from this url:
https://cran.r-project.org/web/packages/rio/index.html
install RStudio from:
https://www.rstudio.com/products/rstudio/download/#download
from the prompt in RStudio follow the Import, Export, and Convert Data Files instructions here:
https://cran.microsoft.com/snapshot/2015-11-15/web/packages/rio/vignettes/rio.html
works a treat, converted my .tsv files to Weka arff format no problem. The only thing I haven't done is test the arff files in Weka yet (and compare with Python sklearn results), as I'm hoping there isn't a problem with commas embedded in the text message bodies. Scikit-Learn and TfidfVectorizer has no problems with embedded commas in a tsv file!

Related

How can I compile a GeoLite CSV file into MMDB again?

I have made a few corrections to location names in a GeoLite2 CSV file.
My site only retrieves locations from the MMDB file, so how can I compile back the changed CSV file into the MMDB binary again.
I searched everywhere for a solution but can't find it.
Thanks for any tip.
Carlos
Currently there are only 2 open source MMDB file writers:
MaxMind::DB::Writer (Perl language)
Go MaxMind DB Writer (Go language)
The second one unfortunately has only a subset of the features available for the Perl one, but it should be enough for writing a program that creates the MMDB file reading line by line the CSV one and creating the mmdbtype instances.
You can check out our mmdbctl utility tool.
To convert a CSV file to an MMDB file use the import command:
$ mmdbctl import --in data.csv --out data.mmdb
Instructions, features, and documentation are available here: github.com/ipinfo/mmdbctl.
Right now it only supports string data types, and not nested data types. See this issue for more information.

Convert NA values to ? automatically while loading

Is there a way to automatically convert NA values to ? in weka while loading .csv files?
Or do we have to use some other script/program to either replace them with ? or a blank space before loading into weka.
Any help or suggestions are welcome. Thanks
Unfortunately I do not believe Weka has a way to do this conversion. This is the case because Weka's native format is .arff files. In .arff files, missing values are denoted with a "?". When a .csv file is loaded, it expects missing values to also be denoted by "?".
Depending on your method of using Weka I suggest:
For the Weka GUI, use "find and replace" in a simple text editor to change "NA" to "?" before loading the .csv into Weka.
For the Weka Java API, write a method to preprocess your ".csv" file before handing it over to the Weka .csv loader.

Importing Prophet Projection file into SAS

Our customer has provided us with a Prophet ".projection" file, which appears to be a binary file (lots of special characters when opening in notepad - ?Š…kÿd?Š…kÿd? ).
My question is - how can this file be imported into SAS? The file was generated from Prophet version 8.1 (PE). In Prophet 7.3 it was possible to use Readbin73 to do the binary->text conversion of the PRJs etc. Is there anything similar for 8.1?
Open the Prophet Workspace (.PRW) file to which your .projection file relates.
Click on Results & Runlogs in the sidebar and expand the option that says "ResultType: Projection".
In the upper left corner of the ribbon you will see two options:
"Copy Grid" allows you to copy the data to the clipboard, after
which you can paste the tab separated data in the programme of your
choice.
"Copy Grid to Excel" allows you to export the data
directly to Excel.
You can then prepare the data for SAS.
These steps work in Prophet 8.2.

Issues when loading data with weka

I am trying to load some csv data in weka. Some gene expression feature for 12 patients. There are around 22,000 features. However, when I load the csv file, it says
not recognized as an "CSV data files' file
to my csv file.
I am wondering is it because of the size of the features or something else. I have checked the csv file and it is nicely comma separated. Any suggestions?
I would not encourage you to use CSV files in Weka. While it is entirely possible (http://weka.wikispaces.com/Can+I+use+CSV+files%3F) it leads to some severe drawbacks. Try to generate a ARFF file from your CSV instead.

how can i read datasets in Weka?

I want to use some of the datasets available at the website of the Weka to perform some
experiments with Neural Networks.
What do I have to do to read the data?
I downloaded the datasets and they were saved as .arff.txt so I deleted the extension of .txt to have only .arff. So I used this file as an ipnut but an error occurs.
Which is the right way to read data?
Do I have to write code?
Please help me.
Thank you
I'm using Weka 3.6.6 and coc81.arff opens just fine. You are using Weka 3.7.x, which is the development branch of Weka. I suggest that you download 3.6.6 or 3.6.7 (the latest stable release) and try to open the file again.
There is also another simple throw...
open your dataset file in excel in my case MS Excel2010, format fields intype.
and save as 'csv',
then reload that csv file in the weka explorer and save on the local drive as arff format.
may be this help.