Differences in calculation for Naive net in WEKA - weka

I tried out Naive net in weka 3.8 using weather dataset
Weka results
OUTLOOK
Sunny Yes = 0.238
No = 0.538
Overcast Yes = 0.429
No = 0.077
Rainy Yes = 0.333
No = 0.385
TEMPERATURE
Hot Yes = 0.238
No = 0.385
mild Yes = 0.429
No = 0.385
Cool Yes = 0.333
No = 0.231
But when I workedout it manually.My results are
OUTLOOK
Sunny Yes (2/9)= 0.22
No (3/5)= 0.60
Overcast Yes (4/9)= 0.44
No (0/5)= 0
Rainy Yes (3/9)= 0.333
No (2/5)= 0.40
TEMPERATURE
Hot Yes (2/9)= 0.222
No (2/5)= 0.40
mild Yes (4/9)= 0.444
No (2/5)= 0.40
Cool Yes (3/9)= 0.333
No (1/5)= 0.20
Why is this difference?
Whether my calculation went wrong somewhere?

Related

How to pause Execution in SAS for milliseconds

How to pause an execution for 5 milliseconds in SAS?
Can I use "CALL SLEEP (0.005)"
I have checked the below link but its confusing
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.2/lefunctionsref/n12ppys43orawkn1q0oxep4cmdk6.htm
The below shall stop the execution for 5 milliseconds in SAS.
data _null_;
call sleep(5);
run;
You can use the optional unit argument to specify the unit of time in seconds, which is applied to n. Default is .001 (milliseconds). You can change it to seconds if you like
data _null_;
call sleep(0.005,1);
run;
The above is equivalent to the former.
Sometimes it is better to try it out than to google:
data try_to_get_some_sleep;
format n unit z5.3 evening morning expect measure time18.9 diff percent7.2;
do unit = 5, 1, .5, .1, .01, .001;
do n = .1, .9, 1.1, 1.9, 2 ;
expect = n * unit;
evening = time();
call sleep (n, unit);
morning = time();
measure = morning - evening;
diff = (measure - expect) / expect;
output;
end;
end;
run;
results in
n unit evening morning expect measure diff
0.100 5.000 10:45:05.983414888 10:45:06.483422041 0:00:00.500000000 0:00:00.500007153 0.00%
0.900 5.000 10:45:06.483437061 10:45:10.984707117 0:00:04.500000000 0:00:04.501270056 0.03%
1.100 5.000 10:45:10.984720945 10:45:16.485454082 0:00:05.500000000 0:00:05.500733137 0.01%
1.900 5.000 10:45:16.485466003 10:45:25.984838009 0:00:09.500000000 0:00:09.499372005 (0.01%)
2.000 5.000 10:45:25.984853983 10:45:35.988686085 0:00:10.000000000 0:00:10.003832102 0.04%
0.100 1.000 10:45:35.988715887 10:45:36.088612080 0:00:00.100000000 0:00:00.099896193 (0.10%)
0.900 1.000 10:45:36.088624954 10:45:36.988639116 0:00:00.900000000 0:00:00.900014162 0.00%
1.100 1.000 10:45:36.988765001 10:45:38.089132071 0:00:01.100000000 0:00:01.100367069 0.03%
1.900 1.000 10:45:38.089145899 10:45:39.989645004 0:00:01.900000000 0:00:01.900499105 0.03%
2.000 1.000 10:45:39.989659071 10:45:41.989659071 0:00:02.000000000 0:00:02.000000000 0.00%
0.100 0.500 10:45:41.989671946 10:45:42.038803101 0:00:00.050000000 0:00:00.049131155 (1.74%)
0.900 0.500 10:45:42.038815975 10:45:42.488348961 0:00:00.450000000 0:00:00.449532986 (0.10%)
1.100 0.500 10:45:42.488362074 10:45:43.038013935 0:00:00.550000000 0:00:00.549651861 (0.06%)
1.900 0.500 10:45:43.038027048 10:45:43.987673044 0:00:00.950000000 0:00:00.949645996 (0.04%)
2.000 0.500 10:45:43.987685919 10:45:44.987751007 0:00:01.000000000 0:00:01.000065088 0.01%
0.100 0.100 10:45:44.987765074 10:45:44.996871948 0:00:00.010000000 0:00:00.009106874 (8.93%)
0.900 0.100 10:45:44.996876955 10:45:45.085994005 0:00:00.090000000 0:00:00.089117050 (0.98%)
1.100 0.100 10:45:45.086005926 10:45:45.195319891 0:00:00.110000000 0:00:00.109313965 (0.62%)
1.900 0.100 10:45:45.195332050 10:45:45.384675980 0:00:00.190000000 0:00:00.189343929 (0.35%)
2.000 0.100 10:45:45.384690046 10:45:45.585688114 0:00:00.200000000 0:00:00.200998068 0.50%
0.100 0.010 10:45:45.585701942 10:45:45.585707903 0:00:00.001000000 0:00:00.000005960 (99.4%)
0.900 0.010 10:45:45.585709095 10:45:45.595653057 0:00:00.009000000 0:00:00.009943962 10.5%
1.100 0.010 10:45:45.595659971 10:45:45.607652903 0:00:00.011000000 0:00:00.011992931 9.03%
1.900 0.010 10:45:45.607661009 10:45:45.626678944 0:00:00.019000000 0:00:00.019017935 0.09%
2.000 0.010 10:45:45.626689911 10:45:45.646678925 0:00:00.020000000 0:00:00.019989014 (0.05%)
0.100 0.001 10:45:45.646688938 10:45:45.646688938 0:00:00.000100000 0:00:00.000000000 ( 100%)
0.900 0.001 10:45:45.646689892 10:45:45.646689892 0:00:00.000900000 0:00:00.000000000 ( 100%)
1.100 0.001 10:45:45.646691084 10:45:45.647506952 0:00:00.001100000 0:00:00.000815868 (25.8%)
1.900 0.001 10:45:45.647507906 10:45:45.647620916 0:00:00.001900000 0:00:00.000113010 (94.1%)
2.000 0.001 10:45:45.647623062 10:45:45.650509119 0:00:00.002000000 0:00:00.002886057 44.3%
You can directly suspend a SAS session between steps using %SYSCALL SLEEP(... or %SYSFUNC(SLEEP(... NOTE: When using %SYSCALL you need to pass macro variables as the arguments, not numeric literal text.
Example:
%let duration = 5;
%let unit = 0.001;
data one; set sashelp.class; run;
%syscall sleep(duration,unit);
data two; set sashelp.cars; run;
or
data one; set sashelp.class; run;
%let rc = %sysfunc ( sleep ( 5, 0.001 ));
data two; set sashelp.cars; run;

How do I plot data in a text file depending on the the value present in one of the columns

I have a text file with with a header and a few columns, which represents results of experiments where some parameters were fixed to obtain some metrics. the file is he following format :
A B C D E
0 0.5 0.2 0.25 0.75 1.25
1 0.5 0.3 0.12 0.41 1.40
2 0.5 0.4 0.85 0.15 1.55
3 1.0 0.2 0.11 0.15 1.25
4 1.0 0.3 0.10 0.11 1.40
5 1.0 0.4 0.87 0.14 1.25
6 2.0 0.2 0.23 0.45 1.55
7 2.0 0.3 0.74 0.85 1.25
8 2.0 0.4 0.55 0.55 1.40
So I want to plot x = B, y = C for each fixed value of And E so basically for an E=1.25 I want a series of line plots of x = B, y = C at each value of A then a plot for each unique value of E.
Anyone could help with this?
You could do a combination of groupby() and seaborn.lineplot():
for e,d in df.groupby('E'):
fig, ax = plt.subplots()
sns.lineplot(data=d, x='B', y='C', hue='A', ax=ax)
ax.set_title(e)

How to plot a multiindex dataframe having suplots for the first level index?

I have a pandas multiindex dataframe with quarters 1-4 and hours 0-23 as the index.
The data Looks like this
quarter hour value1 value2 value3
1 0 0.06 0.47 0.50
1 1 0.65 0.04 0.65
1 2 0.58 0.10 0.60
1 3 0.51 0.07 0.17
...
4 20 0.82 0.17 0.96
4 21 0.08 0.98 0.09
4 22 0.73 0.43 0.73
4 23 0.99 0.85 0.42
How can I plot 4 linegraphs as subplots in a 2x2 arrangement having Q1 and Q4 on the top and Q2 and Q3 on the bottom?
I have been trying with
f, ((ax1, ax4), (ax2, ax3)) = plt.subplots(2, 2, sharex='col', sharey='row')
ax1.plot(df.loc[1])
But it doesnt seem to work.

How do I refer to the current line/observation number in a loop in the data step?

For example, I have data on various latencies that I iterate through and apply a linear function to, as shown: (the function is just an example here)
data latency;
input lat1 - lat20;
array cost[20];
array lat[20];
do x = 1 to 20;
cost[x] = lat[x] * 1.875;
end;
drop x;
datalines;
0.42 0.85 0.59 0.06 0.21 0.35 0.1 0.08 0.85 0.53 0.81 0.44 0.47 0.2 0.99 0.32 0.18 0.87 0.33 0.84
0.11 0.83 0.02 0.59 0.74 0.65 0.76 0.45 0.57 0.22 0.2 0.13 0.42 0.15 0.05 0.51 0.48 0.95 0.39 0.92
0.8 0.9 0.65 0.29 0.77 0.0 0.24 0.05 0.16 0.72 0.58 0.9 0.35 0.63 0.79 0.41 0.73 0.36 0.82 0.16
0.74 0.21 0.57 0.73 0.83 0.78 0.77 0.92 0.13 0.39 0.52 0.14 0.1 0.77 0.68 0.99 0.26 0.37 0.97 0.83
;
run;
How can I save a variable with the current observation number in each iteration of the loop, so that I can use it in calculations later?
I know that proc print will automatically print the observation number, but how do I access this and store it to a variable in the data step? Is there a way to do this as sas reads the datalines line by line?
I tried this, but then the obs variable is 2 for every observation.
data latency;
input lat1 - lat20;
obs = 1; * ADDED LINE;
array cost[20];
array lat[20];
do x = 1 to 20;
cost[x] = lat[x] * 1.875;
end;
obs = obs + 1; * ADDED LINE;
drop x;
datalines;
0.42 0.85 0.59 0.06 0.21 0.35 0.1 0.08 0.85 0.53 0.81 0.44 0.47 0.2 0.99 0.32 0.18 0.87 0.33 0.84
0.11 0.83 0.02 0.59 0.74 0.65 0.76 0.45 0.57 0.22 0.2 0.13 0.42 0.15 0.05 0.51 0.48 0.95 0.39 0.92
0.8 0.9 0.65 0.29 0.77 0.0 0.24 0.05 0.16 0.72 0.58 0.9 0.35 0.63 0.79 0.41 0.73 0.36 0.82 0.16
0.74 0.21 0.57 0.73 0.83 0.78 0.77 0.92 0.13 0.39 0.52 0.14 0.1 0.77 0.68 0.99 0.26 0.37 0.97 0.83
;
run;
proc print data=latency;
run;
This is a small example, but in reality I can't simply add a new variable that stores the line number to the start of each data line and read it in. That isn't practical for the actual data set.
You just need to add a retain statement so SAS doesn't reset obs to 0 at every new observation.
data latency;
retain obs 0;
obs = obs + 1;
...
run;
Your first attempt was very close. Try again, but this time replace this line:
obs = 1; * ADDED LINE;
With this:
retain obs 0; * ADDED LINE;
That way, your obs variable will be retained across your entire dataset instead of being reset to 1 each time.

Strange profiler behavior: same functions, different performances

I was learning to use gprof and then i got weird results for this code:
int one(int a, int b)
{
int i, r = 0;
for (i = 0; i < 1000; i++)
{
r += b / (a + 1);
}
return r;
}
int two(int a, int b)
{
int i, r = 0;
for (i = 0; i < 1000; i++)
{
r += b / (a + 1);
}
return r;
}
int main()
{
for (int i = 1; i < 50000; i++)
{
one(i, i * 2);
two(i, i * 2);
}
return 0;
}
and this is the profiler output
% cumulative self self total
time seconds seconds calls us/call us/call name
50.67 1.14 1.14 49999 22.80 22.80 two(int, int)
49.33 2.25 1.11 49999 22.20 22.20 one(int, int)
If i call one then two the result is the inverse, two takes more time than one
both are the same functions, but the first calls always take less time then the second
Why is that?
Note: The assembly code is exactly the same and code is being compiled with no optimizations
I'd guess it is some fluke in run-time optimisation - one uses a register and the other doesn't or something minor like that.
The system clock probably runs to a precision of 100nsec. The average call time 30nsec or 25nsec is less than one clock tick. A rounding error of 5% of a clock tick is pretty small. Both times are near enough zero.
My guess: it is an artifact of the way mcount data gets interpreted. The granularity for mcount (monitor.h) is on the order of a 32 bit longword - 4 bytes on my system. So you would not expect this: I get different reports from prof vs gprof on the EXACT same mon.out file.
solaris 9 -
prof
%Time Seconds Cumsecs #Calls msec/call Name
46.4 2.35 2.3559999998 0.0000 .div
34.8 1.76 4.11120000025 0.0000 _mcount
10.1 0.51 4.62 1 510. main
5.3 0.27 4.8929999999 0.0000 one
3.4 0.17 5.0629999999 0.0000 two
0.0 0.00 5.06 1 0. _fpsetsticky
0.0 0.00 5.06 1 0. _exithandle
0.0 0.00 5.06 1 0. _profil
0.0 0.00 5.06 20 0.0 _private_exit, _exit
0.0 0.00 5.06 1 0. exit
0.0 0.00 5.06 4 0. atexit
gprof
% cumulative self self total
time seconds seconds calls ms/call ms/call name
71.4 0.90 0.90 1 900.00 900.00 key_2_text <cycle 3> [2]
5.6 0.97 0.07 106889 0.00 0.00 _findbuf [9]
4.8 1.03 0.06 209587 0.00 0.00 _findiop [11]
4.0 1.08 0.05 __do_global_dtors_aux [12]
2.4 1.11 0.03 mem_init [13]
1.6 1.13 0.02 102678 0.00 0.00 _doprnt [3]
1.6 1.15 0.02 one [14]
1.6 1.17 0.02 two [15]
0.8 1.18 0.01 414943 0.00 0.00 realloc <cycle 3> [16]
0.8 1.19 0.01 102680 0.00 0.00 _textdomain_u <cycle 3> [21]
0.8 1.20 0.01 102677 0.00 0.00 get_mem [17]
0.8 1.21 0.01 $1 [18]
0.8 1.22 0.01 $2 [19]
0.8 1.23 0.01 _alloc_profil_buf [22]
0.8 1.24 0.01 _mcount (675)
Is it always the first one called that is slightly slower? If that's the case, I would guess it is a CPU cache doing it's thing. or it could be lazy paging by the operating system.
BTW: what optimization flags are compiling with?