how can i read datasets in Weka? - weka

I want to use some of the datasets available at the website of the Weka to perform some
experiments with Neural Networks.
What do I have to do to read the data?
I downloaded the datasets and they were saved as .arff.txt so I deleted the extension of .txt to have only .arff. So I used this file as an ipnut but an error occurs.
Which is the right way to read data?
Do I have to write code?
Please help me.
Thank you

I'm using Weka 3.6.6 and coc81.arff opens just fine. You are using Weka 3.7.x, which is the development branch of Weka. I suggest that you download 3.6.6 or 3.6.7 (the latest stable release) and try to open the file again.

There is also another simple throw...
open your dataset file in excel in my case MS Excel2010, format fields intype.
and save as 'csv',
then reload that csv file in the weka explorer and save on the local drive as arff format.
may be this help.

Related

SAS Enterprise Guide - How can I download a ZIP file from a website, extract its contents and run it?

I would like to have a task to download a continuously updated .ZIP data file from a specific website, extract its contents and run the file inside.
I am looking for a program code that performs these tasks and, so to speak, if the data on the website is updated, then my data file is also updated with it.
How can I do it?
Please help!
SAS Enterprise Guide 8.2
I didn't find a solution for it.
Read Chris method here for reading a zip file using filename statement: https://blogs.sas.com/content/sasdummy/2014/01/29/using-filename-zip/
Chris is the community director at SAS (something like that). A zip file will be locked on an update. Check the file date and store the date somewhere so you can see if it changed.
There are lots of ways to approach your problem and I dont know your constraints. Start with Chris' post and work from there.
You could download the zip using proc http
This macro will let you unzip it: https://core.sasjs.io/mp__unzip_8sas.html
And this macro will give you the recursive directory contents: https://core.sasjs.io/mp__tree_8sas.html

cPickle.load() doesnt accept non-.gz files, what can I use for .pkl files?

I am trying to run an example of a LSTM recurrent neural network that is presented in this git: https://github.com/mesnilgr/is13.
I've installed theano and everything and when I got to the point of running the code, I've noticed the data was not being downloaded, so I've opened an issue on the github (https://github.com/mesnilgr/is13/issues/12) and this guy came up with a solution that consisted in:
1-get the data from the dropbox link he provides.
2- change the code of the 'load.py' file to download, and read the data properly.
The only issue is that the data in the dropbox folder(https://www.dropbox.com/s/3lxl9jsbw0j7h8a/atis.pkl?dl=0) is not a compacted .gz file as, I suppose, was the data from the original repository. So I dont have enough skill to change the code in order to do with the uncompressed data exaclty what it would do with the compressed one. Can someone help me?
The modification suggested and the changes I've done are described on the issue I've opened on the git(https://github.com/mesnilgr/is13/issues/12).
It looks like your code is using
gzip.open(...)
But if the file is not gzipped then you probably just need to remove the gzip. prefix and use
open(...)

How to open a tab-delimited file in Weka

When I try to open a tab-delimited file in Weka it says: "file format is not recognized". In the subsequent dialog box it shows weka.core.converters.CSVLoader and says "Reads a source that is in comma separated or tab separate format." When I click the OK button, it throws an error saying "wrong number of values. Read 11, expected 10 line 4." I verified the same file in Excel that the line had 10 fields.
Could someone advise a workaround?
The data file cannot be converted to CSV format because some of the fields contain a comma.
When installing the unofficial Weka package common-csv-weka-package, you can load tab-delimited CSV files using the CommonCSVLoader loader. Simply change the loader's format from DEFAULT to TDF (-F command-line option).
I had same problem. So far the best solution I found is using R to convert a tabular data file into arff. Google two keywords "import data to R" and "export R data to weka arff". My second choice is using JMP or SAS to open a csv or Excel workbook and then export as CSV.
I found a solution: for Windows 10, install the R language package from this url:
https://cran.r-project.org/web/packages/rio/index.html
install RStudio from:
https://www.rstudio.com/products/rstudio/download/#download
from the prompt in RStudio follow the Import, Export, and Convert Data Files instructions here:
https://cran.microsoft.com/snapshot/2015-11-15/web/packages/rio/vignettes/rio.html
works a treat, converted my .tsv files to Weka arff format no problem. The only thing I haven't done is test the arff files in Weka yet (and compare with Python sklearn results), as I'm hoping there isn't a problem with commas embedded in the text message bodies. Scikit-Learn and TfidfVectorizer has no problems with embedded commas in a tsv file!

How do you convert hdf5 files into a format that is readable by SAS Enterprise Miner(sas7bdat)

I have a subset of the data set called as 'million song dataset' available on the website (http://labrosa.ee.columbia.edu/millionsong/) on which I would like to perform data mining operations on SAS Enterprise Miner (13.2).
The subset I have downloaded contains 10,000 files and they are all in HDF5 format.
How do you convert hdf5 files into a format that is readable by SAS Enterprise Miner(sas7bdat)
On Windows there is an ODBC driver for HD5. If you have SAS/ACCESS ODBC then you can use that to read the file.
I don't think it's feasible to do this directly, as hdf5 seems to be a binary file format. You might be able to use another application to convert hdf5 to a plain text format and then write SAS code to import that.
I think some of the other files on this page might be easier to import:
http://labrosa.ee.columbia.edu/millionsong/pages/getting-dataset

Difference in file size of an Excel file when downloading directly as opposed to open and saving it

May be the title of my question is really awful but I couldn't figure a better way to frame it. So the problem is I have a Silverlight web app that does some processing and generates an Excel file as output. THe Excel generation code uses OpenXML format to create various XML parts and packages and using System.Packaging.CompressionOptions I compress the file generated. Now, when the browser (IE 9) shows a download options box, if I click Open to open the file in Excel and then do a SaveAs, it saves the file with a further reduced size as opposed to if I hit Save directly on the download box in which case it saves it with whatever size the file was created with.
Any ideas why these 2 ways of saving the same file result in different sizes?
Cheers
Depending on how you used the OpenXML library, there might be some inefficiencies or errors. Resaving the file in Excel will fix any duplicate formatting, update the metadata (possibly reducing it) and fix any validation errors. I encourage getting the Open XML SDK 2.0 Productivity Tool provided with the OpenXML SDK to check for any validation errors and to better understand where more inefficiencies might lie. It is possible to automatically resave the file using Excel by using Interop (using C# anyways).