I created an arff file, then I opened the file in Weka. Weka read the file as one attribute while I have ten attributes. Can anyone help me to solve this issues?
The code and Weka result are shown below
#Relation Dentition_Records
#attribute N NUMERIC
#attribute Animal String
#attribute I NUMERIC
#attribute i NUMERIC
#attribute C NUMERIC
#attribute c NUMERIC
#attribute P NUMERIC
#attribute p NUMERIC
#attribute M NUMERIC
#attribute m NUMERIC
#Data
1;Opossum;5;4;1;1;3;3;4;4
2;Hairy tail mole;3;3;1;1;4;4;3;3
3;Common mole;3;2;1;0;3;3;3;3
4;Star nose mole;3;3;1;1;4;4;3;3
5;Brawn bat;2;3;1;1;3;3;3;3
6;Sliver hair bat;2;3;1;1;2;3;3;3
7;Pigmy bat;2;3;1;1;2;2;3;3
8;House bat;2;3;1;1;1;2;3;3
9;Red bat;1;3;1;1;2;2;3;3
10;Hoary bat;1;3;1;1;2;2;3;3
11;Lump nose bat;2;3;1;1;2;3;3;8
12;Armadillo;0;0;0;0;0;0;3;3
13;Pika;2;1;0;0;2;2;3;3
14;Snowshoe rabbit;2;1;0;0;3;2;3;3
15;Beaver;1;1;0;0;2;1;3;3
16;Marmot;1;1;0;0;2;1;3;3
17;Groundhog;1;1;0;0;2;1;3;3
18;Prairie Dog;1;1;0;0;2;1;3;3
19;Ground Squirrel;1;1;0;0;2;1;3;3
20;Chipmunk;1;1;0;0;2;1;3;3
21;Gray swuirrel;1;1;0;0;2;1;3;3
22;Fox squirrel;1;1;0;0;1;1;3;3
23;Pocket gopher;1;1;0;0;1;1;3;3
24;Kangaroo rat;1;1;0;0;1;1;3;3
25;Pack rat;1;1;0;0;0;0;3;3
26;Field mouse;1;1;0;0;0;0;3;3
27;Muskrat;1;1;0;0;0;0;3;3
28;Black rat;1;1;0;0;0;0;3;3
29;House mouse;1;1;0;0;0;0;3;3
30;Porcupine;1;1;0;0;1;1;3;3
31;Guinea pig;1;1;0;0;1;1;3;3
32;Coyote;1;1;1;1;4;4;3;3
33;Wolf;3;3;1;1;4;4;2;3
34;Fox ;3;3;1;1;4;4;2;3
35;Bear;3;3;1;1;4;4;2;3
36;Civet cat;3;3;1;1;4;4;2;2
37;Raccon;3;3;1;1;4;4;3;2
38;Marten;3;3;1;1;4;4;1;2
39;Ficher;3;3;1;1;4;4;1;2
40;Weasel;3;3;1;1;3;3;1;2
41;Mink;3;3;1;1;3;3;1;2
42;Ferrer;3;3;1;1;3;3;1;2
43;Wolverine;3;3;1;1;4;4;1;2
44;Badger;3;3;1;1;3;3;1;2
45;Skunk;3;3;1;1;3;3;1;2
46;River otter;3;3;1;1;4;3;1;2
47;Sea otter;3;2;1;1;3;3;1;2
48;Jaguar;3;3;1;1;3;2;1;1
49;Ocelot;3;3;1;1;3;2;1;1
50;Cougar;3;3;1;1;3;2;1;1
51;Lynx;3;3;1;1;3;2;1;1
52;Fur seal;3;2;1;1;4;4;1;1
53;Sea Lion;3;2;1;1;4;4;1;1
54;Walrus;1;0;1;1;3;3;0;0
55;Grey seal;3;2;1;1;3;3;2;2
56;Elephant seal;2;1;1;1;4;4;1;1
57;Peccary;2;3;1;1;3;3;3;3
58;ELK;0;4;1;0;3;3;3;3
59;Deer;0;4;0;0;3;3;3;3
60;Moose;0;4;0;0;3;3;3;3
61;Reindeer;0;4;1;0;3;3;3;3
62;Antelope;0;4;0;0;3;3;3;3
63;Bison;0;4;0;0;3;3;3;3
64;Mountain goat ;0;4;0;0;3;3;3;3
65;Musk ox;0;4;0;0;3;3;3;3
66;Mountain sheep;0;4;0;0;3;3;3;3
Replace all semicolons ; with commas , surround all strings with ' and your file will work just fine.
Example:
#Relation Dentition_Records
#attribute N NUMERIC
#attribute Animal String
#attribute I NUMERIC
#attribute i NUMERIC
#attribute C NUMERIC
#attribute c NUMERIC
#attribute P NUMERIC
#attribute p NUMERIC
#attribute M NUMERIC
#attribute m NUMERIC
#Data
1,Opossum,5,4,1,1,3,3,4,4
4,'Star nose mole',3,3,1,1,4,4,3,3
Related
I want to replace the character 'O' (capital O) by '0' (zero), but i have this character in different numbers.
Exemple:
in my dataset i have the number 8OO, 9O, 1O1, etc. and i need to change it for all of the O's.
Thank you!
Use the TRANSLATE function
From the docs:
TRANSLATE(source, to-1, from-1 <, ...to-n, from-n>)
data want;
set have;
*replaces letter O with 0;
newVariable = translate(oldVariable, "0", "O");
*converts values from newVariable to a numeric value;
newVarnum = input(newVariable, 8.);
run;
I'm transitioning from SQL Server to SAS.
In SQL server we could get away with string comparisons where 'abc ' = 'aBc' would be true.
Is SAS so far I've had to STRIP and UPPER every string on every comparison.
Is there an option that can be set to allow for 'abc ' = 'aBc' be true ?
My Google-Fu has failed me.
I believe you are looking for the compare function with the 'i' modifier (for ignore case). When this returns a 0 there's a match.
(See p. 70 in here: http://support.sas.com/publishing/pubcat/chaps/59343.pdf)
data a;
input string1 $ string2 $;
datalines;
abc aBc
cba CBA
AbC ABC
AC AbC
BCA CAb
;
run;
data b;
set a;
c = compare(string1,string2);
d = compare(string1,string2,'i');
run;
proc print noobs;
where d = 0;
var string1 string2;
run;
You can try the PRX functions which use Perl Regular Expressions.
'/abc/i' will match anything with the string 'abc' in any case (because of the 'i' after the closing /)
Using PRXMATCH as an example:
prxmatch('/abc/i', 'aBc')
Will return 1 as this is the position that string occurs.
More on regular expressions: https://www.cs.tut.fi/~jkorpela/perl/regexp.html
PRX in SAS:
http://documentation.sas.com/?docsetId=lefunctionsref&docsetVersion=3.1&docsetTarget=n0bj9p4401w3n9n1gmv6tfshit9m.htm&locale=en
I have a set of numerical strings (used in filenames) which I would like to parse into a vectors
Here is an example
-0_01_-1_0_23_0_52_-0_25
Which should be parse into
-0.01 -1 0.23 0.52 -0.25
The rules are:
There are 5 numbers between [-1, 1]
Numbers are separated by '_'
Decimal point is replaced by '_'.
integer numbers {-1, 0, 1}, don't have a decimal point
How can I use regex (preferably matlab) to convert the string into a vector?
I tried some regex expression, but got stuck with dealing with the integer rule.
Use this code:
a = '-0_01_-1_0_23_0_52_-0_25';
a = strrep(a, '0_', '0.');
res = regexp(a, '(-?[0-9]+(?:\.[0-9]+)?)','match');
res = cellfun(#str2num, res)
First, replace 0_ with 0, and then use the -?[0-9]+(?:,[0-9]+)? regex to match the numbers only.
The regex matches an optional -, then 1+ digits, and then an optional substring with , and 1+ digits.
I am trying to convert a character column to numeric and I have tried using:
var=input(var,Best12.);
var=var*1;
Both of them returned character columns, and there is only 1 warning message:
"Character values have been converted to numeric values at the places given by: (Line):(Column). 7132:4".
Is there another what to do this conversion inside SAS?
(my apologies if this is trivial)
Thanks!
What you're doing will work if you assign the result to a new variable:
data tmp;
char='1';
run;
data tmp;
set tmp;
num=char*1;
run;
proc contents; run;
I am working with a data-set of dimension more than 10,000. To use Weka I need to convert text file into ARFF format, but since there are too many attributes even after using sparse ARFF format file size is too large. Is there any similar method as for data to avoid writing so many attribute identifier as in header of ARFF file.
for example :
#attribute A1 NUMERICAL
#attribute A2 NUMERICAL
...
...
#attribute A10000 NUMERICAL
I coded a script in AWK to format the following lines (in a TXT file) to an ARFF
example.txt source:
Att_0 | Att_1 | Att_2 | ... | Att_n
1 | 2 | 3 | ... | 999
My script (to_arff), you can change FS value depending on the separator used in the TXT file:
#!/usr/bin/awk -f
# ./<script>.awk data.txt > data.arff
BEGIN {
FS = "|";
# WEKA separator
separator = ",";
}
# The first line
NR == 1 {
# WEKA headers
split(FILENAME, relation, ".");
# the relation's name is the source file's name
print "#RELATION "relation[1]"\n";
# attributes are "numeric" by default
# types available: numeric, <nominal> {n1, n2, ..., nN}, string and date [<date-format>]
for (i = 1; i <= NF; i++) {
print "#ATTRIBUTE "$i" NUMERIC";
}
print "\n#DATA";
}
NR > 1 {
s = "";
first = 1;
for (i = 1; i <= NF; i++) {
if (first)
first = 0;
else
s = s separator;
s = s $i;
}
print s;
}
Output:
#RELATION example
#ATTRIBUTE Att_0 NUMERIC
#ATTRIBUTE Att_1 NUMERIC
#ATTRIBUTE Att_2 NUMERIC
#ATTRIBUTE Att_n NUMERIC
#DATA
1,2,3,9999