I am trying to upload an arff file on weka but it is creating this problem:
Unable to determine structure as arff (Reason:
java.io.IOException:}expected at end of enumeration,read Token[EOL],
line 4)
#RELATION data1
#ATTRIBUTE attribute_0 {"T,"N,"A,"C,"V}
#ATTRIBUTE attribute_1 REAL
#ATTRIBUTE attribute_2 {""VRoot"",""0""",""1""",""Hide1"",1,10001",1",10002",10003",10004",10005",10006",10007",10008",10009",10010",10011",10012",10013",10014",10015",10016",10017",10018",10019",10020",10021",10022",10023",10024",10025",10026",10027",100
According to the ARFF Format Documentation, REAL is not a valid attribute type.
Try NUMERIC.
Also be careful with quotes. The parser may assume that " is used to quote strings, and your quotes do not match.
Related
How to use the ChangeDateFormat filter in Weka (Waikato Environment for Knowledge Analysis) properly on preprocessing files with a date attribute?
I have the following CSV file:
Date,Open,High,Low,Close,Adj Close,Volume
2004-08-19,49.813290,51.835709,47.800831,49.982655,49.982655,44871361
2004-08-20,50.316402,54.336334,50.062355,53.952770,53.952770,22942874
2004-08-23,55.168217,56.528118,54.321388,54.495735,54.495735,18342897
2004-08-24,55.412300,55.591629,51.591621,52.239197,52.239197,15319808
... and so on.
When I open it with WEKA, it recognizes the first attribute as "Nominal", not "Date". Then, when I try to apply the ChangeDateFormat filter from filters → unsupervised → attributes on the "Date" attribute and click "Apply", Weka gives me an error:
Problem filtering instances: Chosen attribute not date.
However, there are no filters like "NominalToDate", only the "NominalToBinary" and "NominalToString", and there are no filters like "StringToDate".
Therefore, I had to rename this file to .arff and add the #attribute headers the following way:
#relation GOOG
#attribute Date date 'yyyy-MM-dd'
#attribute Open numeric
#attribute High numeric
#attribute Low numeric
#attribute Close numeric
#attribute AdjClose numeric
#attribute Volume numeric
#data
However, I didn't like the idea of manually tinkering the files, so I want to know how to use the ChangeDateFormat filter in this case.
How can I use the ChangeDateFormat filter to specify the date format of the imported files, and, if it is not possible with the ChangeDateFormat, what are the use cases for this filter?
As you see, the datetime in my dataset does not have the time part.
To import a CVS with a date attribute without time, turn on the "Invoke options dialog" checkbox in the "Open file..." dialog.
Then, in the "Options" dialog, specify the index of the date attribute and adjust the format accordingly by deleting the time part.
As a result, the date attribute in the imported dataset will have the "Timestamp" type.
I'm quite new in Weka.
I was wondering, is it possible for Weka to classify 2 different set of database which consists of different attributes in Weka?
Example:
Dataset A : #attributes {UserID, Tags, Descriptions}
#data
a,#user, writing books
Dataset B : #attributes {UserID, Longitude, Latitude, Dates}
#data
xyz ,7895231, 453221.1, 28.10.2012
Is it possible to merge Dataset A and B with different attribute into 1 dataset in Weka ? I was told that I can manually merge it in the excel before Weka classify it but I was wandering how does Weka read the data? Is it row by row? Is it logical to put in these form (excel) and put value 0?
Dataset AB : UserID, Tags, Descriptions, UserID, Longitude,
Latitude, Dates
a, #user, writing books, 0, 0,0
xyz, 0, 0 , 7895231, 453221.1, 28.10.2012
Yes. This is covered in this posting:
https://list.waikato.ac.nz/pipermail/wekalist/2009-April/043232.html
This also covers the situation in which you want to append two files (add instances).
This is done in the Weka Command Line Interface (CLI).
One trick to this is that there seems to be a line length limit, so move your files to the default directory (which seems to be Program Files/Weka-3-8), so you don't have a problem with long paths.
Suppose we have the file "merge A.arff" consisting of
#relation 'merge A'
#attribute UserID numeric
#attribute A1 {Joe,Bill,Larry}
#attribute A2 numeric
#attribute Aclass {pos,neg}
#data
1,Joe,17,pos
3,Joe,42,neg
5,Bill,8,neg
7,Larry,4,neg
and the file "merge B.arff" consisting of
#relation 'merge B'
#attribute BUserID numeric
#attribute Blong numeric
#attribute Blat numeric
#data
1,-180,42
3,-182,45
5,-179,36
7,-184,38
then if you open the CLI and type the following after the > prompt
java weka.core.Instances merge "merge A.arff" "merge B.arff"
the following will be dumped to the console:
#relation 'merge A_merge B'
#attribute UserID numeric
#attribute A1 {Joe,Bill,Larry}
#attribute A2 numeric
#attribute Aclass {pos,neg}
#attribute BUserID numeric
#attribute Blong numeric
#attribute Blat numeric
#data
1,Joe,17,pos,1,-180,42
3,Joe,42,neg,3,-182,45
5,Bill,8,neg,5,-179,36
7,Larry,4,neg,7,-184,38
For some reason, I'm having trouble piping this directly to another file, e.g.
java weka.core.Instances merge "merge A.arff" "merge B.arff" > "output.arff"
Either it's not creating the file, or I can't find where it's creating it. But one problem at a time!
I'm new to all these Data mining, WEKA Tool etc.,
In my academic project I have to deal with bug reports. I have them in my SQL Server. I took the Bug summary attribute and applied tokenization,stop words removal and stemming techniques.
All the stemmed words in the summary are stored in database ; separated. Now I have to apply Frequent pattern mining algorithm and find out frequent item sets by using WEKA tool. I have my arff file like this.
#relation ItemSets
#attribute bugid integer
#attribute summary string
#data
755113,enhanc;keep;log;recommend;share
759414,access;review;social
763806,allow;intrus;less;provid;shrunken;sidebar;social;specifi
767221,datacloneerror;deeper;dig;framework;jsm
771353,document;integr;provid;secur;social
785540,avail;determin;featur;method;provid;social;whether
785591,chat;dock;horizont;nest;overlap;scrollbar
787767,abus;api;implement;perform;runtim;warn;worker
After opening it in Weka, under the Associate tab of WEKA Explorer I'm unable to start the process(Start button is disabled) with Apriori selected.
Now please suggest me how to find frequent itemsets on the summary attribute using WEKA. I.m in need of serious help. Help will be appreciated. Thanks in advance!
The reason why Apriori is not available using your file in Weka is that Apriori only allows nominal attribute values. What sort of rules are you trying to find? Could you give an example of rules you want to obtain?
values_you_want_to_be_the_antecedent_part_of_your_rule ==> values_you_want_to_be_the_consequent_part_of_your_rule
Changing your attributes to nominal like this
#relation ItemSets
#attribute bugid {755113, 759414, 763806}
#attribute summary {'enhanc;keep;log;recommend;share', 'access;review;social', 'allow;intrus;less;provid;shrunken;sidebar;social;specifi'}
#data
755113,'enhanc;keep;log;recommend;share'
759414,'access;review;social'
763806,'allow;intrus;less;provid;shrunken;sidebar;social;specifi'
will only give you rules like
bugid=755113 1 ==> summary=enhanc;keep;log;recommend;share 1 <conf:(1)> lift:(3) lev:(0.22)
If you're looking for frequent itemsets among the summary words, the bugid is irrelevant and you can remove it from your file. Apriori is used to obtain association rules e.g. enhanc, keep gives log with support X and confidence Y. To find frequent itemsets, you need to restructure your data so that each summary word is an attribute with values true/false or true/missing, see this question.
Try the following file in Weka. Select Associate, choose Apriori, double-click on the white input field next to the Choose button. There, set outputItemSets to true. In the console output, you will see all frequent itemsets and all obatined rules with sufficient support.
#relation ItemSets
#attribute enhanc {true}
#attribute keep {true}
#attribute log {true}
#attribute recommend {true}
#attribute share {true}
#attribute access {true}
#attribute review {true}
#attribute social {true}
#attribute allow {true}
#attribute intrus {true}
#attribute less {true}
#attribute provid {true}
#attribute shrunken {true}
#attribute sidebar {true}
#attribute specifi {true}
#data
true,true,true,true,true,?,?,?,?,?,?,?,?,?,?
?,?,?,?,?,true,true,true,?,?,?,?,?,?,?
?,?,?,?,?,?,?,true,true,true,true,true,true,true,true
The questionmarks ? represent a missing value.
I am cleaning a data set with google open refine and then trying to use it in Weka to do some cluster analysis. I am dealing with a nominal column that stores range of salaries.
I've specified the attribute as below
#ATTRIBUTE Income {'0-30000','30000-50000','50000-75000','75000-150000','>150000'}
In the data set there are rows in which the 'Income' column is null and I suppose that is the reason why I get the error:
'nominal value not declared in header, read Token line 13'
Is there a way I can replace null values with a string( and then specify the string in the attribute)? - If so how do i specify it in the #ATRRIBUTE row?
Or would it be possible to include the null in the set of attributes?
Thanks
I want to use KNN algorithm with TF-IDF in WEKA GUI. Firstly I run the algorithm in default conditions. Secondly I choose "IDFTransform" and "TFTransform" as "true" in StringToWordVector filter and run.
There is no difference in two results.
Result1:
Correctly Classified Instances 1346 91.3781 %
Result2:
Correctly Classified Instances 1346 91.3781 %
My ".arff" file is as follows:
#relation et9
#attribute 'alis' real
#attribute 'banka' real
...
#attribute 'urun' real
#attribute 'class' {yes, no}
#data
70,0,0,0,3,0,40,0,3,1,0,0,20,0,717,2,4,0,0,0,2,5,0,0,0,717,0,1,0,30,yes
22,0,0,63,158,0,1,0,7,0,10,0,4,0,57,0,0,0,0,204,0,0,2,2,0,530,0,0,6,0,yes
0,0,1,0,0,0,0,0,2,1,3,0,0,0,0,0,5,0,0,0,0,0,2,1,0,0,0,0,0,0,no
...
I know that StringToWordVector is used for strings. But I want to calculate TF-IDF for this ".arff" file. How can I use my current ".arff" file and have KNN algorithm result with TF-IDF?
(This is my academic work. Please help...)
According to Weka's documentation, the StringToWordVector filter "Converts String attributes into a set of attributes representing word occurrences [...]". Therefore, applying this filter to an arff file that does not contain any String attributes will have no effect on the dataset.
In order to make use of this filter, you will need to prepare an arff file that contains a String attribute, where the value of this attribute is the text for the given instance. For example, if each instance represents one tweet, then the text from the tweet would be the value for this String attribute. More information on working with text in weka is documented here.