WEKA - Compute an average number of non-missing attributes per class attribute - weka

I'm working with a dataset with a lot of missing values. For analytical purposes, I want to be able to compute the average number of non-missing attributes assigned to each class label in the following manner.
Having data like
#relation class
#attribute one {1,2}
#attribute two {1,2}
#attribute three {1,2}
#attribute class {human, animal}
#data
1,Na,Na,human
1,1,Na,human
Na,Na,2,animal
I want to be able to obtain a result like this.
Average non-missing attributes per class label:
- human = 1.5
- animal = 1
Is there any way of doing this in the WEKA explorer?

Related

sas covariates in a linear regressions

I am running a simple linear regression in SAS. The regression has three different groups of participants as the predictors (with group 1 as the reference), the outcome a continuous social support variable, and five covariates. Three of the covariates are dichotomized (age, sex, & education), one is a three-level nominal variable (marital status), and the last is continuous (it's a chronic disease index).
My question is: Do I need to specify the different types of covariates in the SAS coding somehow?
Would this coding example be correct?:
proc glm data=work.example;
class group age sex education marital education chronic_diseases;
model social_support = group age sex education marital education chronic_diseases;
estimate 'group 1 vs group 2' group -1 1 0;
estimate 'group 1 vs group 3' group -1 0 1;
run;
The class statement tells SAS that you want to consider a variable non-continuous: that is, categorical or binary. It doesn't differentiate between the two, as it will choose the reference based on the first value in ascending order by default unless you specify a reference group.
For example, if you're comparing Apples and Oranges, SAS will use Apples as the reference value. Hey, they're fruit - you can compare fruit to fruit! :)
All model covariates are considered numeric unless specified in a class statement. Since chronic_diseases is continuous, simply remove it from the class statement; otherwise, SAS will look at every single value of chronic_diseases and consider it a level, then compare them all to the lowest level.
proc glm data=work.example;
class group age sex education marital education;
model social_support = group age sex education marital education chronic_diseases;
estimate 'group 1 vs group 2' group -1 1 0;
estimate 'group 1 vs group 3' group -1 0 1;
run;

How to get attributes per classes from Weka

I have 11 classes/categories in my data set. And for every class there are some instances assigned to it. I need to know the attributes/words that Weka extracted per category and the numeric value assigned to each attribute. Is there a way to do that?
This is a sample of arff file format for Weka TRAINING and CLASSIFY:
In this case I'm talking about a portion of file used in Semeval 2014 competition for the Spanish...
#relation Task10EnglishS2014
#attribute PathLenAlign numeric
#attribute ResAlign numeric
#attribute LcAlign numeric
#attribute WupAlign numeric
#attribute Res numeric
#attribute Lc numeric
#attribute DiceSimilarityAttribute numeric
#attribute NumericEvaluation numeric
#data
1,9.5852985,3.637587,1,8.0142254,3.637587,0.75,5.000
1,9.20881283333333,3.637587,1,8.3916004,3.637587,1,5.000
0.625,2.812914,2.754695,0.761905,2.812914,2.754695,0.5,0.292893218813452,0.300
...
piecemeal:
#relation Task10EnglishS2014
#relation + name of the set or experiment
#attribute LcAlign numeric
#attribute + name of attribute + type of attribute
#data
from here to start the instances or vector of values ​​for each input.
This is the training set and this is used for training a model for classify new instances.
In Weka explorer we need to load this file in the Preprocess tab. Them in Classify tab you need to choose a classifier and set Cross-validation with 10 Folds and click to Start button. This will generate a model trained.
The arff classification file must have the following structure:
#relation Task10EnglishS2014
#attribute PathLenAlign numeric
#attribute ResAlign numeric
#attribute LcAlign numeric
#attribute WupAlign numeric
#attribute Res numeric
#attribute Lc numeric
#attribute DiceSimilarityAttribute numeric
#attribute NumericEvaluation numeric
#data
1,9.5852985,3.637587,1,8.0142254,3.637587,0.75,?
1,9.20881283333333,3.637587,1,8.3916004,3.637587,1,?
0.625,2.812914,2.754695,0.761905,2.812914,2.754695,0.5,0.292893218813452,?
...
The ? simbol mean that this is the value to classify.
Them you must select the option: "Supplied test set" and select the file to classify and in "Mode options..." select "Output predictions" and them right click in the model and select "Re-evaluate model on current test set".
In the right panel the results were visualized.

Weka Time Series Forecast, More Attributes

According to this link, it's explained how to create a forecast model based on date field and measures.
It's working in case when I do not put any other attributes. I need to forward more fields, like SKU no, PointOfSales no etc.
In those cases it returns me an error with 'bad sorting' message even it is sorted asc by date field.
Any help? Am I missing something simple?
EDIT:
working table:
DateField TIMESTAMP
Sales FLOAT
Non working table:
DateField TIMESTAMP
Sales FLOAT
SKU VARCHAR
PoS VARCHAR
Non-working ARFF sample
#relation 'C:\\Users\\admpentaho\\Desktop\\file1.arff'
#attribute dat_dok date 'yyyy/MM/dd HH:mm:ss.SSS'
#attribute sif_rob {0051S12,0312S13,0762S21,160047,160049,160051,160058,2101S11,250000,250001,250002,250003,250006,250007,250008,2562S12,280001,280002,280003,280004,280005,280006,280007,280008,280009,280010,280011,280012,280013,280014,280015,280017,280018,280019,280020,280021,280022,280023,280024,280027,280028,280043,280044,280045,280047,280049,280050,280053,'2810 S12',3081S21,370002,370003,370007,370015,50004,50005,50006,50007,50008,50015,50018,50019,50023,50034,50070,50073,50076,50077,50101,50103,50112,50116,50118,50138,50148,50149,50161,50185,50186,50196,50198,50199,50203,50204,50236,50238,50239,50240,50241,50242,50243,50244,50245,50246,50251,50252,50264,50272,50273,50274,50279,50295,50296,50299,50307,50308,70001,70003,70005,70006,70007,70008,70013,70014,70015,70016,70017,70018,70028,70029,70032,70033,70035,70039,70040,70042,70043,70044,70046,70048,70049,70065,70070,70073,70074,70078,70079,70085,70087,70088,70089,70092,70096,70097,70098,70100,70136,70137,70144,70156,70167,70168,70169,70241,702426,70243,70245,79003,900327,900334,900336,900353,900354,900356,900362,ALP3150,ALP3151,AM12012,AM12013,AM12014,AM12015,AM12016,AM12017,AM12018,AM12019,AM12050,AMAN57788,AMAN58725,BER3044,BER3047,BER3048,BER3054,BER3070,BIS10001,BIS10108,BIS10223,BIS10224,BIS10289,BIS10307,BIS10309,BIS10313,BIS10315,BIS10362,BIS10369,BIS10370,BIS10371,BIS10379,BIS10380,BIS11107,BIS11208,BIS11236,BIS11237,BIS11238,BIS11304,BIS11306,BIS11320,BIS11327,BIS12015,BIS12018,BIS12072,BIS12074,BIS12076,BIS12078,BIS12079,BIS12080,BIS12081,BIS12083,BIS12084,BIS12091,CAS3372,CAS3376,CAS3377,CAS3378,CAS3379,CE001,CE002,CE003,CE004,CE005,CE106348,CE701850,CE701852,DE0FX3176,DE0SI0012,DE0SI1034,DE0SI1041,DE0SW0209,DE0SX0007,DE0SX0011,DE0SX0012,DE0SX0015,DE0SX0034,DE0SX0041,DE0SX1093,DE0UA2112,DE0UD2100,DE0UN2103,DE0UN2104,DEGN20GN1,DEPA10985,DEPPE1976,DERI10850,DERI40853,DERI50854,DERI60855,DESE26919,DESG86916,DESOF4001,DESOL1001,DESOL2002,DMK3130,DMK3131,DMK3132,DMK3133,DMK8008,DON0006,DON0007,DON0008,DON0013,DON0016,E0111461,E0111463,E0111466,E0111505,E0111506,E0112153D,E0112215,E0112353,E0112488,E0112535,E0112536,E100378,E101119,E101280,E101327,E101331,E101599,E101661,E101661D,E101661K,E101685,E102033,E103896,E105309,E106347,E106348,E106353,E106354,E106356,E106357,E106358,E106359,E106360,E108758,E108759,E1100834,E1100912,E1101523,E1103885,E1104176,E1104579,E1105041,E1106190,E1106349,E1106350,E1106361,E1106362,E1106363,E1106831,E1108159,E1108551,E1109364,E1109697,E1109747,E1111521,E1200368,E1202281,E1202799,E1203602,E1208085,E1208086,E1208662,E1211522,E1308084,E1501017,E1501144,E1501145,E1501146,E1501545,E1502172,E1505223,E1505854,E1507020,E1507948,E1507950,E1601055,E1704580,E1705080,E2006333,E700784,E701153,E701154,E701155,E701197,E701197D,E701596,E701787,E701850,E701850D,E701852,E701852D,E701856,E701856D,E701858,E702665,E704253,E706522,E707095,E707454,E707461,E707605,E707617,E707856,E708265,E708266,E708373,E708374,E708375,E708678,E708685,E708988,E708989,E708990,E709361,E709671,E709884,E710598,EPAL,EVB-2000,EVB-2001,EVB-303,EVB-411,EVB-412,EVB-413,EVB-414,EVB-416,EVB-418,EVB-421,EVB-422,EVB-424,EVB-500293,EVB-500294,EVB-500295,EVB-500296,EVB-500297,EVB-500329,EVB-500330,EVB-500331,EVB-500396,EVB-500397,EVB-500399,EVB-500400,EVB-500402,EVB-500403,EVB-500404,EVB-500405,EVB-500717,EVB-500750,EVB-500751,EVB-500752,EVB-500756,EVB-500778,EVB-500781,EVB-501075,EVB-501076,EVB-502130,EVB-502140,EVB-502290,EVB-502291,EVB-502552,EVB-502680,EVB-502681,EVB-502682,EVB-502683,EVB-502897,EVB-502953,EVB-503051,EVB-503251,EVB-503252,EVB-503255,EVB-503283,EVB-503284,EVB-503290,EVB-503291,EVB-503651,EVB-503654,EVB-503781,EVB-K411,EVB-K413,EVB-K421,EVB-K422,EVB-K424,FI31002,FI31011,FLA000021,FLA000027,FLA000028,FLA000030,FLA000031,FLA000033,FLA000066,FLA000067,FLA000075,FLA000083,FLA000084,FLA000087,FLA000089,FLA000091,FLA000111,FLA000112,FLA000133,FLA000138,FLA000241,FLA000242,FLA000243,FLA000279,FLA000330,FLA000331,FLA000386,FLA000487,FLA000589,FLA000715,FLA000839,FLA001164,FLA001165,FLA001223,FLA001225,FLA001226,FLA001363,FLA001474,FLA001475,FLA001888,FLA001949,FLA002289,FLA002291,FLA002292,FLA002431,FLA002763,FLA002764,FLA002792,FLA002921,FLA002922,FLA002923,FLA002934,FLA002940,FLA002944,FLA002946,FLA002948,FLA002952,FLA002956,FLA002958,FLA002961,FLA002965,FLA002967,FLA002969,FLA002983,FLA002985,FLA002987,FLA002989,FLA002991,FLA002993,FLA003001,FLA003009,FLA003024,FLA003026,FLA003028,FLA003031,FLA003035,FLA003039,FLA003045,FLA003050,FLA003052,FLA003089,FLA003181,FLA003182,FLA003183,FLA003184,FLA003186,FLA003214,FLA003215,FLA003216,FLA003273,FLA003283,FLA003320,FLA003330,FLA003331,FLA003333,FLA003334,FLA003527,FLA003601,FLA003728,FLA003729,FLA003730,FLA003731,FLA003916,FLA003917,FLA003919,FLA003927,FLA003934,FLA004105,FLA004129,FLA004130,FLA004196,FLA004244,FLA004299,FLA004300,FLA004478,FLA004479,FLA004497,FLA004535,FLA004536,FLA004537,FLA004538,FLA004539,FLA004540,FLA004542,FLA004543,FLA004544,FLA004545,FLA004550,FLA004551,FLA004552,FLA004553,FLA004599,FLA004608,FLA004614,FLA005674,FLA006086,FLA006087,FLA290001,FRS064,GE17919,GE19948,GE19949,GEP61504,GL110000,GL210000,GL340100,GL360200,GL360201,GOR1120,GOR1213,GOR1301,GOR1308,GOR1311,GOR1315,GOR1316,GOR1339,GOR1340,GOR1341,GOR1604,GOR1817,GOR1852,GOR1854,GOR1902,GOR1905,GOR1906,GOR1907,GOR1912,GOR1913,GOR1927,GOR1929,GOR1930,GOR1932,GOR1933,GOR1934,GOR1942,GOR1944,GOR1945,GOR1947,GOR1948,GOR1952,H1834230,H2013212,H2017280,H2017283,H2017288,H2295500,H2295800,H2298188,H2395100,H2395166,H2395166A,H2395250A,H2395520A,H2395540,H2395566,H2395840,H2395866,H2395866A,H2494131,H2494241,H2494311,H2496131,H2496241,H2496311,H2498131,H2498241,H2498311,H2499231,H2499241,H2499311,H2817541,H2818541,H2819541,H2943251,H2961001,H2961101,H2961201,HLS001,HOC5006,HOC5007,HOC5108,HOC6008,HOC6009,HOC9000,HOC9001,HOC9002,HOC9007,HOC9009,HOC9015,HUG016,HUG017,HUG019,HUG020,HUG1607,HUGLS0102,HUGLS0103,J1423204,J1764002,J1764003,J1855405,J2061704,J2079703,J2080503,J2103103,J2103303,J2103603,J2129506,J2133406,J2141110,J2142209,J2144402,J2148404,J2265102,J2279902,J2318705,J2364102,J2364202,J2364702,J2364802,J2364902,J2372202,J2375802,J2423819,J2466503,J2500800,J2584509,J2584706,J2589804,J3098916,J3098917,J3101902,J3106903,J3166005,J3184001,J3185101,J3185401,J3276504,J3276805,J3277102,J3309702,J3328304,J3333602,J3362404,J3365006,J3369805,J3382502,J3398204,J3519004,J3646716,J3660202,J3666000,J3666400,J3682507,J3699704,J3803900,J3809405,J3834607,J3835107,J3841200,J3854102,J3854402,J3854702,J3855002,J3887401,J3897402,J3898002,J3922901,J3954502,J3983101,J3995301,J4010601,J4023602,J4023702,J4069403,J4100102,J4145904,J4146303,J4272302,J4290002,J4292000,J4292701,J4293401,J4307300,J4313100,J4318700,J431900,J4319000,J4346101,J43709,J4394702,J4432601,J4458601,J4459701,J4501700,J4670402,J4730908,J4736310,J4748307,J4775203A,J4775304A,J4789400,J4789502,J4952004,J4985800,J4993200,J4993300,J5217309,J5217907,J5218006,J5218106,J5218405,J5337500,J5462500,J5479200,J5705100,J5705200,J5705300,J5768200,J5768300,J5787900,J5800300,J5805000,J5806105,J5852100,J5883001,J6008100,J6008400,J6037300,J6041600,J6046200,J6047400,J6057301,J6057401,J6127800,J6148800,J6212700,J6213300,J6213800,J6213900,J6214000,J6214100,J6214200,J6214300,J6214400,J6214500,J6214600,J6214700,J6214800,J6214900,J6215300,J6215400,J6215600,J6215800,J6216000,J6216100,J6216200,J6216300,J6216400,J6216600,J6216700,J6217000,J6246000,J6250800,J6281000,J6287000,J6306700,J6307300,J6308600,J6309100,J6354503,J6390201,J6394600,J6424300,J6424400,J6428503,J6428901,J6454700,J6460700,J6463100,J6468500,J6479400,J6479600,J6489501,J6504800,J6504900,J6638000,J6651100,J6667301,J6673203,J6686800,J6705700,J6764800,J6766002,J6766100,J6777800,J6849200,J6849300,J6849400,J6849600,J6849900,J6889701,J702226,J702227,J702228,J702229,J7023900,J7125300,J7125700,J7144900,J7188400,J7290900,J7324500,J7519300,J7567700,J7567800,J7650100,J7650400,J7650400D,J7653400,J7653400D,J7656600,J7693500,J7696700,J7706600D,J7759001,J7759101,J7869100,J8003704,J8046100,J8060201,J8060500,J8083500,J8083600,J8083701,J8083901,J8084001,J8084100,J8084201,J8084300,J8084400,J838704,J838810,JCC003,JJB033,JJB035,JJB036,JJB047,JJB048,JJB049,JJB050,JJB051,JJB052,JJJB0030,JJJB010,JJJB0106,JJJB0114,JJJB0118,JJJB012,JJJB0120,JJJB0121,JJJB0124,JJJB0125,JJJB013,JJJB019,JJJB020,JJJBO20135,JJOB020415,JJRX503204,JO14007,JO14008,JO14009,JO14010,JO14011,JO14016,JO14017,JO14018,JO14019,JO14050,JOB006,JRE003,JRE004,JRE007,JRE008,JRE010,K3086160,K3158150,K3262010,K3270661,K3270670,K3415800,K3417600,K3482121,K3519880,K3527090,K3568140,K3570140,K3571140,K3697800,K3698210,K3745030,K3824120,K3855270,K4521020,K4584040,K4700430,K4725040,K4729010,K4739350,K4762230,K4762240,K4845030,K4845040,K4933950,K4933960,K4934100,K4934110,K4961350,K4961360,K4961450,K4961800,K4961810,K5016385,K5027020,K5110510,K5115120,K5605000,K5606000,KLBT0107,KLBT0108,KLFT0105,MI14045,MI14046,MI14047,MI14048,MP00314,MP00317,MP00331,MP00419,MP00580,MP00834,MP00999,MP01064,MP01084,MP01098,MP01152,MP01154,MP01214,MP01247,MP01254,MP01321,MP01369,MP01435,MP01455,MP01456,MP01543,MP01584,MP01593,MP01595,MP01600,MP01601,MP01605,MP01606,MP01608,MP01640,MP01645,MP01653,MP01689,MP01694,MP01701,MP01865,MP01866,MP02051,MP02059,MP02126,MP02183,MP02408,MP02410,MP02420,MP02459,MP02491,MP02563,MP02610,MP02615,MP02936,MP02952,MP02968,MP02974,MP02977,MP03017,MP03020,MP03263,MP04125,MP04261,MP04321,MP04474,MP04703,MP05206,MP05313,MP05314,MP05460,MP05462,MP05716,MP05768,MP06144,MP06145,MP06198,MP06339,MP06349,MP06436,MP06466,MP06467,MP06468,MP06609,MP06610,MP07183,MP07248,MP07249,MP07290,MP07313,MP07673,MP07696,MP07702,MP07876,MP08113,MP08119,MP08128,MP08136,MP08225,MP08247,MP08248,MP08302,MP08318,MP08416,MP08474,MP08476,MP08962,MP09019,MP09336,MP09540,MP09558,MP10118,MP11079,MP11085,MP11259,MP11543,MP11560,MP11664,MS008,PCF001,PCF003,PCF004,PCF005,PCOR001,PEB001,PEB002,PEB003,PEB004,PEB005,PEB006,PEB013,PEB014,PEB018,PEB019,PEB020,PFR004,PFR007,PFR008,PFRS004,PFRS005,PFRS007,PHUG003,PHUG004,PHUG005,PJ&J103,PJ&J105,PJ&J106,PJ&J107,PJ&J108,PJ&J110,PJ&J113,PJ&J114,PJ3185401,PJ4010601,PJ7519300,PJB001,PJB006,PJB007,PJB009,PJB010,PJB011,PJB012,PJB013,PJB014,PJB015,PJB019,PLM001,PLM002,PNT001,PO0045,PO0046,POB001,POB002,POB003,POB0033,POB004,POB005,POB006,POB010,POB011,POB012,PPJ5800300,RE13207,RE13208,RE13209,RE13210,RE13211,RE13212,RE13304,RE13307,RE13308,RE13309,RE13310,RE13311,RH2395540,RJ5217907,RJ5218006,RJ5218106,RJ5218405,RJ5986705,RJ6213300,RJ6213800,RJ6214000,RJ6214600,RJ6215800,RJ7759101,RS0107,RSKO004,RSKO005,RSKO006,RSKO007,RSKO008,RSKO009,RSKO010,RSKO011,RSKO012,RSKO013,RSKO014,RSKO015,RSKO016,RSKO017,SAN1220,SAN2000,SAN2003,SAN2029,SAN2201,SAN2302,SAN2400,SAN2401,SAN2502,SAN3256,SAN5029,SKO0001,SKO1013,SKO1064,SKO1381,SKO1382,SKO1411,SKO161,SKO163,SKO164,SKO169,SKO171,SKO1718,SKO1720,SKO2093,SKO2094,SKO3320,SKO3564,SKO3603,SKO3683,SKO3729,SKO3730,SKO3812,SKO4194,SKO4272,SKO4281,SKO611,SKO612,SKO706,SKO711,SKO724,SKO777,SKO944,SKO988,SP13200,SP13202,SP13203,SP13204,SP13205,SP13206,SP13300,SP13301,SP13302,SP13303,SP13306,TEFS11E27,UNI12215721,UNI14402415,UNI14520114,UNI14520312,UNI15440220,UNI15440321,UNI15440422,UNI15667103,UNI16661001,UNI16874401,UNI16874601,UNI17112413,UNI17137701,UNI17137901,UNI17139805,UNI17276602,UNI17783401,UNI18304712,UNI19047102,UNI19102908,UNI19249808,UNI19418414,UNI19421510,UNI19423004,UNI19461610,UNI19656103,UNI19905101,UNI26704201,UNI39047202,V0001,V0002,V1050001,V1050002,V1051001,V1051002,V1051011,V1052001,V1052002,V1052003,V1052007,V1052008,V1052009,V1052010,V1052011,V1052045,V1052046,V1052047,V1052048,V1053001,V1053002,V1053004,V1054001,V1054002,V1054004,V1055001,V1055002,V1100012,V1100013,V1100014,V1155001,V1155002,V1250001,V1250002,V1250003,V1250004,V1250005,V1250006,V1251001,V1251002,V1251003,V1251004,V1251023,V1251024,V1251032,V1251038,V1253001,V1253002,V1253003,V1253004,V1253005,V1253006,V1253007,V1253008,V1253019,V1254002,V1254003,V1254004,V1255001,V1255002,V1255003,V1255004,V1545001,V1545002,V1545003,V1545004,V1545005,V1545007,V1758001,V1758002,V1758003,V1758004,V1859001,V1859002,V1859003,V1859004,V1860001,V1860002,V1860003,V1946001,V1946002,V1946003,V1946006,V1947001,V2061001,V2061002,V2061003,V2061004,V2061005,V2061006,V2365002,V2365003,V2700007,V2800001,V2800002,V2800013,V2800014,V3300051,V3887001,V3887002,VA001,VA002,VA003,VA4103101412,VA4103101414,VA4106101412,VA4106101414,VA4703101436,VA4706101436,VA4903121412,VA4903121414,VA4906121412,VA4906121414,VA56703101402,VA56706101402,VA57666101451,VIPC001,WES3021,WES3024,WES3026,WES3031,WES3035,WES3069,WES3079,WES3080,ZOT1000,ZOT1001,ZOT1002,ZOT1003,ZOT1004,ZOT1005,ZOT1006,ZOT1009,ZOT1011,ZOT1013,ZOT1014,ZOT1043,ZOT1044,ZOT1045,ZOT1046,ZOT16003,ZOT16004,ZOT16005,ZOT2000,ZOT2001,ZOT2010,ZOT2025,ZOT2028,ZOT2043,ZOT2045,ZOT2046,ZOT2047,ZOT2047D,ZOT2108,ZOT2110,ZOT2402,ZOT2403,ZOT2405,ZOT2406,ZOT2407,ZOT3003,ZOT3004,ZOT3005,ZOT3014,ZOT3076,ZOT3106,ZOT3107,ZOT3108,ZOT3109,ZOT3135,ZOT3136,ZOT3137,ZOT3152,ZOT3153,ZOT3154,ZOT6006,ZOT6007,ZOT61396,ZOT9043,ZOT9044}
#attribute vred_rab numeric
#data
'2013/01/03 00:00:00.000',PHUG005,4255.3
'2013/01/03 00:00:00.000',PJ7519300,17708.2
'2013/01/03 00:00:00.000',PNT001,13780.7
'2013/01/09 00:00:00.000',GEP61504,1117.8
'2013/01/09 00:00:00.000',TEFS11E27,341.6
'2013/01/10 00:00:00.000',280001,-9000.7
'2013/01/10 00:00:00.000',280005,-2663

I get the wrong result using IBK in weka

I have two data set, one for training data set and another for test data set.
I want to find the first nearest neighbor of the first instance of test data set among training data set, I wrote the following code:
data.setClassIndex(data.numAttributes() - 1);
data1.setClassIndex(data1.numAttributes() - 1);
IBk knn = new IBk();
String[] options = new String[2];
options[0]= "-E";
options[1]= "-I";
knn.setOptions(options);
knn.setKNN(1);
knn.setCrossValidate(true);
knn.buildClassifier(data);
int d = knn.getKNN();
Double c = knn.classifyInstance(data1.instance(0));
System.out.println(d);
System.out.println(c);
I don't know why I get the wrong result. Because the instance that I am looking for the nearest neighbor for, is in training set as well and the result should be the same value as the test instance , but it is not!
My Training dataset:
#relation MyRelation
#attribute Fname {Tina,Alex,Poul,Johan,Sarah}
#attribute Lname {Sansed,Erikson,Nadi,Raj,Maad}
#attribute Status {Single,Maried}
#attribute sex {Fmale,male}
#attribute 'worked years' numeric
#attribute age numeric
#attribute Adults numeric
#attribute tax numeric
#attribute salary numeric
#data
Tina,Sansed,Single,Fmale,14,35,5,362.79,1332.5
Alex,Erikson,Maried,male,14,40,6,0,1245.3
Poul,Nadi,Maried,male,6,35,6,207.32,1150.8
Johan,Raj,Maried,male,29,48,5,0,959
Sarah,Maad,Single,Fmale,16,42,2,0,667.1
Now in Test data set I want to determine the nearest neighbor for instance:
Johan,Raj,Maried,male,29,48,5,0,959
The Result of running the code is:
1
1332.5
It gives me the first instance of the training set every time that I try!

Weka : How to prepare test set in weka

I have been using SVM classifier with the following data
#relation whatever
#attribute mfe numeric
#attribute GB numeric
#attribute GTB numeric
#attribute Seeds numeric
#attribute ABP numeric
#attribute AU_Seed numeric
#attribute GC_Seed numeric
#attribute GU_Seed numeric
#attribute UP numeric
#attribute AU numeric
#attribute GC numeric
#attribute GU numeric
#attribute A-U_L numeric
#attribute G-C_L numeric
#attribute G-U_L numeric
#attribute (G+C) numeric
#attribute MFEi1 numeric
#attribute MFEi2 numeric
#attribute MFEi3 numeric
#attribute MFEi4 numeric
#attribute dG numeric
#attribute dP numeric
#attribute dQ numeric
#attribute dD numeric
#attribute Outcome {Yes,No}
#data
-24.3,1,18,2,9,4,3,0.5,8,10,7,1,0.454545455,0.318181818,0.045454545,7,-0.157792208,-0.050206612,-1.104545455,-1.35,-1.104545455,0,0,0,Yes
-24.8,2,15,2,7.5,2,3,1,7,5,8,2,0.208333333,0.333333333,0.083333333,8,-0.129166667,-0.043055556,-0.516666667,-1.653333333,-1.033333333,0,0,0,No
-24.4,1,16,3,5.333333333,1.666666667,2.666666667,1,4,5,8,3,0.217391304,0.347826087,0.130434783,8,-0.132608696,-0.046124764,-1.060869565,-1.525,-1.060869565,0,0,0,Yes
-24.2,1,18,2,9,2,2.5,1,10,5,11,2,0.227272727,0.5,0.090909091,11,-0.1,-0.05,-1.1,-1.344444444,-1.1,0,0,0,Yes
-24.5,3,17,2,8.5,2,3,1,5,6,9,2,0.272727273,0.409090909,0.090909091,9,-0.123737374,-0.050619835,-0.371212121,-1.441176471,-1.113636364,-0.12244898,0,0,Yes
This is my training set . And in this its defined whether my data is yes class or no class. My question is my test data is from unknown source and i dont have idea to what class it belongs. so how to prepare my test set. without the outcome attribute weka is giving the "ereor: Data mismatch " . How to prepare the test set? to separate my variable as Yes and nO class using SVM.
Steps to prepare the test set:
Create a training set in CSV format.
Also create the test set in CSV format with same no. of attributes and same type.
Copy the test set and paste at the end of the training set and save as new CSV file.
Import the saved CSV file in step 3 using Weka>>Explorer>>Preprocess.
In Filter Option Choose filters>>unsupervised>>instances>>Remove Range.
Click the feed which says RemoveRange-R first-last.
Specify the range you want to remove say the training data had 100 values, then select first-100 and Apply the filter.
Save as Arff file and this can be used as a test set.
Then Apply this set. If you still have any errors, write as a reply to this post.
If you don't want to go through hassles, then you can prepare your test set with exact names, data types and data range as in your training set and of course with attribute values. The class attribute will be present but the value should be a question mark (?). For instance, to convert your given training set to a test set the following change can be done`#relation whatever
#relation whatever-TEST
#attribute mfe numeric
#attribute GB numeric
#attribute GTB numeric
#attribute Seeds numeric
#attribute ABP numeric
#attribute AU_Seed numeric
#attribute GC_Seed numeric
#attribute GU_Seed numeric
#attribute UP numeric
#attribute AU numeric
#attribute GC numeric
#attribute GU numeric
#attribute A-U_L numeric
#attribute G-C_L numeric
#attribute G-U_L numeric
#attribute (G+C) numeric
#attribute MFEi1 numeric
#attribute MFEi2 numeric
#attribute MFEi3 numeric
#attribute MFEi4 numeric
#attribute dG numeric
#attribute dP numeric
#attribute dQ numeric
#attribute dD numeric
#attribute Outcome {Yes,No}
#data
-24.3,1,18,2,9,4,3,0.5,8,10,7,1,0.454545455,0.318181818,0.045454545,7,-0.157792208,-0.050206612,-1.104545455,-1.35,-1.104545455,0,0,0,?
-24.8,2,15,2,7.5,2,3,1,7,5,8,2,0.208333333,0.333333333,0.083333333,8,-0.129166667,-0.043055556,-0.516666667,-1.653333333,-1.033333333,0,0,0,?
-24.4,1,16,3,5.333333333,1.666666667,2.666666667,1,4,5,8,3,0.217391304,0.347826087,0.130434783,8,-0.132608696,-0.046124764,-1.060869565,-1.525,-1.060869565,0,0,0,?
-24.2,1,18,2,9,2,2.5,1,10,5,11,2,0.227272727,0.5,0.090909091,11,-0.1,-0.05,-1.1,-1.344444444,-1.1,0,0,0,?
-24.5,3,17,2,8.5,2,3,1,5,6,9,2,0.272727273,0.409090909,0.090909091,9,-0.123737374,-0.050619835,-0.371212121,-1.441176471,-1.113636364,-0.12244898,0,0,?
`
Do we need to replace the values of last attribute with question mark in test data?
I am confused
I did test my data by two methods
removing the values of last attribute and putting ? As a replacement.
I used the test data as it is ( not reming the class attribute)
Whether you are evaluating a trained model on a dataset or trying to make predictions with a trained model, the dataset has to have the exact same structure as the training data (attribute names, attribute types, order of nominal labels). This includes the class attribute.
If you want to test your model, then you need ground truth values to compare the predictions against. Otherwise you cannot generate statistics.
If you want to make predictions, then the class values should be all missing.
For removing the class values, you can either do that manually, or you can use the missing-values-imputation Weka package. Use the weka.filters.unsupervised.attribute.MissingValuesInjection filter in conjunction with the ClassOnly injection scheme.