I just performed the fisher test in R and in Excel on a 2x2 table with the contents 1 6 and 7 2. I can't manage to do this in sas.
data my_table;
input var1 var2 ##;
datalines;
1 6 7 2
;
proc freq data=my_table;
tables var1*var2 / fisher;
run;
The test somehow ignores that the table consists of the 4 variables but when I print the table it looks fine. In the test the contents of the table are 0, 1, 1, 0. I guess I need to change something when creating the data but what?
You do NOT have two variables that each have two categories.
Read the data in this way instead.
data have ;
do var1=1,2 ; do var2=1,2;
input count ##;
output;
end; end;
datalines;
1 6 7 2
;
Now VAR1 and VAR2 both have two possible values and COUNT has the number of cases for the particular combination. Use the WEIGHT statement to tell PROC FREQ to use COUNT as the number of cases.
proc freq data=have ;
weight count ;
tables var1*var2 / fisher ;
run;
Related
I try to show all orders of Mode.
For example, I import excel like:
A
1
1
2
3
3
3
and code is :
ods select Modes;
proc univariate data=Want modes;
var A;
run;
this Result shows like:
Mode Count
3 3
I want to show like
Mode Count
3 3
1 2
2 1
how can I do that???
Your desired output is actually not modes. Modes returns most frequent value or values (if there is more than one with the same frequency) with the corresponding count. In your example, there is only one mode (3), as it is the value with the highest frequency. And that's what the result shows.
You may be interested in showing frequencies of every value present in variable A. In that case, you want to use this code:
ods select Frequencies;
proc univariate data=Want freq;
var A;
run;
That is a frequency table.
data have ;
input A ##;
cards;
1 1 2 3 3 3
;
proc freq data=have order=freq ;
tables a / out=counts;
run;
proc print data=counts;
run;
Result:
Obs A COUNT PERCENT
1 3 3 50.0000
2 1 2 33.3333
3 2 1 16.6667
I am new to sas and are trying to handle some customer data, and I'm not really sure how to do this.
What I have:
data transactions;
input ID $ Week Segment $ Average Freq;
datalines;
1 1 Sports 500 2
1 1 PC 400 3
1 2 Sports 350 3
1 2 PC 550 3
2 1 Sports 650 2
2 1 PC 700 3
2 2 Sports 720 3
2 2 PC 250 3
;
run;
What I want:
data transactions2;
input ID Week1_Sports_Average Week1_PC_Average Week1_Sports_Freq
Week1_PC_Freq
Week2_Sports_Average Week2_PC_Average Week2_Sports_Freq Week2_PC_Freq;
datalines;
1 500 400 2 3 350 550 3 3
2 650 700 2 3 720 250 3 3
;
run;
The only thing I got so far is this:
Data transactions3;
SET transactions;
if week=1 and Segment="Sports" then DO;
Week1_Sports_Freq=Freq;
Week1_Sports_Average=Average;
END;
else DO;
Week1_Sports_Freq=0;
Week1_Sports_Average=0;
END;
run;
This will be way too much work as I have a lot of weeks and more variables than just freq/avg.
Really hoping for some tips are, as I'm stucked.
You can use PROC TRANSPOSE to create that structure. But you need to use it twice since your original dataset is not fully normalized.
The first PROC TRANSPOSE will get the AVERAGE and FREQ readings onto separate rows.
proc transpose data=transactions out=tall ;
by id week segment notsorted;
var average freq ;
run;
If you don't mind having the variables named slightly differently than in your proposed solution you can just use another proc transpose to create one observation per ID.
proc transpose data=tall out=want delim=_;
by id;
id segment _name_ week ;
var col1 ;
run;
If you want the exact names you had before you could add data step to first create a variable you could use in the ID statement of the PROC transpose.
data tall ;
set tall ;
length new_name $32 ;
new_name = catx('_',cats('WEEK',week),segment,_name_);
run;
proc transpose data=tall out=want ;
by id;
id new_name;
var col1 ;
run;
Note that it is easier in SAS when you have a numbered series of variable if the number appears at the end of the name. Then you can use a variable list. So instead of WEEK1_AVERAGE, WEEK2_AVERAGE, ... you would use WEEK_AVERAGE_1, WEEK_AVERAGE_2, ... So that you could use a variable list like WEEK_AVERAGE_1 - WEEK_AVERAGE_5 in your SAS code.
Here is my data :
data example;
input id sports_name;
datalines;
1 baseball
1 basketball
1 cricket
1 soccer
2 golf
2 fencing
This is just a sample. The variable sports_name is categorical with 56 types.
I am trying to transpose the data to wide form where each row would have a user_id and the names of sports as the variables with values being 1/0 indicating Presence or absence.
So far, I used proc freq procedure to get the cross tabulated frequency table and put that in a different data set and then transposed that data. Now i have missing values in some cases and count of the sports in rest of the cases.
Is there any better way to do this?
Thanks!!
You need a way to create something from nothing. You could have also used the SPARSE option in PROC FREQ. SAS names cannot have length greater than 32.
data example;
input id sports_name :$16.;
retain y 1;
datalines;
1 baseball
1 basketball
1 cricket
1 soccer
2 golf
2 fencing
;;;;
run;
proc print;
run;
proc summary data=example nway completetypes;
class id sports_name;
output out=freq(drop=_type_);
run;
proc print;
run;
proc transpose data=freq out=wide(drop=_name_);
by id;
var _freq_;
id sports_name;
run;
proc print;
run;
Same theory here, generate a list of all possible combinations using SQL instead of Proc Summary and then transposing the results.
data example;
informat sports_name $20.;
input id sports_name $;
datalines;
1 baseball
1 basketball
1 cricket
1 soccer
2 golf
2 fencing
;
run;
proc sql;
create table complete as
select a.id, a_x.sports_name, case when not missing(e.sports_name) then 1 else 0 end as Present
from (select distinct ID from example) a
cross join (select distinct sports_name from example) a_x
full join example as e
on e.id=a.id
and e.sports_name=a_x.sports_name;
quit;
proc transpose data=complete out=want;
by id;
id sports_name;
var Present;
run;
Another question. I have multiple data sets that generate ouput how can output these into one excel work sheet and apply my own formating. For example I have data set 1, data set 2, data set 3
each data set has two coloumns, for example
Col 1 Col 2
1 2
3 4
5 6
I want each data set to be in one worksheet and seperated by column , so in excel it should look like
Col 1 Col 2 Blank Col Col 1 Col 2 Blank Col
Somone told me I need to look at DDE for this is this true
Regards,
You can definitely do it using DDE. What DDE does it just simulates user's clicks at Excel's menus, buttons, cells etc. Here's an example how you can do that with macro loop for 3 datasets with names have1, have2 and have3. If you need more general solution (unknown number of datasets, with various number of variables, random datasets' names etc), the code should be updated, but its 'DDE-part' will be essentially pretty the same.
One more assumption - your Excel workbook should be open during code execution. Though it can be also automated - Excel can be started and file can be open using DDE itself.
You can find a very nice introduction into DDE here, where all these trick discussed in details.
data have1;
input Col1 Col2;
datalines;
1 2
3 4
5 6
;
run;
data have2;
input Col1 Col2;
datalines;
1 2
3 4
5 6
7 8
;
run;
data have3;
input Col1 Col2;
datalines;
1 2
3 4
7 8
5 6
9 10
;
run;
%macro xlsout;
/*iterating through your datasets*/
%do i=1 %to 3;
/*determine number of records in the current dataset*/
proc sql noprint;
select count(*) into :noobs
from have&i;
quit;
/*assign a range on the workbook spreadsheet matching to data in the current dataset*/
filename range dde "excel|[myworkbook.xls]sas!r1c%eval((&i-1)*3+1):r%left(&noobs)c%eval((&i-1)*3+2)" notab;
/*put data into selected range*/
data _null_;
set have&i;
file range;
put Col1 '09'x Col2;
run;
%end;
%mend xlsout;
%xlsout
You cannot do exactly this with SAS (DDE is probably possible). I would suggest looking at SaviCells Pro.
http://www.sascommunity.org/wiki/SaviCells
http://www.savian.net/utilities.html
You could likely accomplish what you're asking through ODS TAGSETS.EXCELXP or the new ODS EXCEL (9.4 TS1M1). You would need to arrange the datasets ahead of time (ie, merge them together or transpose or whatnot to get one dataset with the right columns), however, or else use PROC REPORT or some other procedure to get them in the right format.
Hi another quick question
in proc sql we have on which is used for conditional join is there something similar for sas data step
for example
proc sql;
....
data1 left join data2
on first<value<last
quit;
can we replicate this in sas datastep
like
data work.combined
set data1(in=a) data2(in=b)
if a then output;
run;
You can also can reproduce sql join in one DATA-step using hash objects. It can be really fast but depends on the size of RAM of your machine since this method loads one table into memory. So the more RAM - the larger dataset you can wrap into hash. This method is particularly effective for look-ups in relatively small reference table.
data have1;
input first last;
datalines;
1 3
4 7
6 9
;
run;
data have2;
input value;
datalines;
2
5
6
7
;
run;
data want;
if _N_=1 then do;
if 0 then set have2;
declare hash h(dataset:'have2');
h.defineKey('value');
h.defineData('value');
h.defineDone();
declare hiter hi('h');
end;
set have1;
rc=hi.first();
do while(rc=0);
if first<value<last then output;
rc=hi.next();
end;
drop rc;
run;
The result:
value first last
2 1 3
5 4 7
6 4 7
7 6 9
Yes there is a simple (but subtle) way in just 7 lines of code.
What you intend to achieve is intrinsically a conditional Cartesian join which can be done by a do-looped set statement. The following code use the test dataset from Dmitry and a modified version of the code in the appendix of SUGI Paper 249-30
data data1;
input first last;
datalines;
1 3
4 7
6 9
;
run;
data data2;
input value;
datalines;
2
5
6
7
;
run;
/***** by data step looped SET *****/
DATA CART_data;
SET data1;
DO i=1 TO NN; /*NN can be referenced before set*/
SET data2 point=i nobs=NN; /*point=i - random access*/
if first<value<last then OUTPUT; /*conditional output*/
END;
RUN;
/***** by SQL *****/
proc sql;
create table cart_SQL as
select * from data1
left join data2
on first<value<last;
quit;
One can easily see that the results coincide.
Also note that from SAS 9.2 documentation: "At compilation time, SAS reads the descriptor portion of each data set and assigns the value of the NOBS= variable automatically. Thus, you CAN refer to the NOBS= variable BEFORE the SET statement. The variable is available in the DATA step but is not added to any output data set."
There isn't a direct way to do this with a MERGE. This is one example where the SQL method is clearly superior to any SAS data step methods, as anything you do will take much more code and possibly more time.
However, depending on the data, it's possible a few approaches may make sense. In particular, the format merge.
If data1 is fairly small (even, say, millions of records), you can make a format out of it. Like so:
data fmt_set;
set data1;
format label $8.;
start=first; *set up the names correctly;
end=last;
label='MATCH';
fmtname='DATA1F';
output;
if _n_=1 then do; *put out a hlo='o' line which is for unmatched lines;
start=.; *both unnecessary but nice for clarity;
end=.;
label='NOMATCH';
hlo='o';
output;
end;
run;
proc format cntlin=fmt_set; *import the dataset;
quit;
data want;
set data2;
if put(value,DATA1F.)="MATCH";
run;
This is very fast to run, unless data1 is extremely large (hundreds of millions of rows, on my system) - faster than a data step merge, if you include sort time, since this doesn't require a sort. One major limitation is that this will only give you one row per data2 row; if that is what is desired, then this will work. If you want repeats of data2 then you can't do it this way.
If data1 may have overlapping rows (ie, two rows where start/end overlap each other), you also will need to address this, since start/end aren't allowed to overlap normally. You can set hlo="m" for every row, and "om" for the non-match row, or you can resolve the overlaps.
I'd still do the sql join, however, since it's much shorter to code and much easier to read, unless you have performance issues, or it doesn't work the way you want it to.
Here's another solution, using a temporary array to hold the lookup dataset. Performance is probably similar to Dmitry's hash-based solution, but this should also work for people still using versions of SAS prior to 9.1 (i.e. when hash objects were first introduced).
I've reused Dmitry's sample datasets:
data have1;
input first last;
datalines;
1 3
4 7
6 9
;
run;
data have2;
input value;
datalines;
2
5
6
7
;
run;
/*We need a macro var with the number of obs in the lookup dataset*/
/*This is so we can specify the dimension for the array to hold it*/
data _null_;
if 0 then set have2 nobs = nobs;
call symput('have2_nobs',put(nobs,8.));
stop;
run;
data want_temparray;
array v{&have2_nobs} _temporary_;
do _n_ = 1 to &have2_nobs;
set have2 (rename=(value=value_array));
v{_n_}=value_array;
end;
do _n_ = 1 by 1 until (eof_have1);
set have1 end = eof_have1;
value=.;
do i=1 to &have2_nobs;
if first < v{i} < last then do;
value=v{i};
output;
end;
end;
if missing(value) then output;
end;
drop i value_array;
run;
Output:
value first last
2 1 3
5 4 7
6 4 7
7 6 9
This matches the output from the equivalent SQL:
proc sql;
create table want_sql as
select * from
have1 left join have2
on first<value<last
;
quit;
run;