Unique combinations - sas

I'm looking to examine unique combinations of 7 binary variables (cannabis modes of delivery [yes/no]) and was under the impression this should be a fairly simple task in SAS. However, all of the coding examples I've come across online seem a bit overcomplicated for such a basic process. If anyone has insight regarding this concept I would appreciate it!

So if you want the counts probably it easiest to use PROC SUMMARY.
proc summary data=have nway missing ;
class var1-var7 ;
output out=want(rename=(_freq_=count));
run;

proc sort data=yourdataset(Keep=Variables to examine seperated by blanks) out=choose_a_name noduprec;
by Variables to examine seperated by blanks;
run;
Resulting dataset choose_a_name will contain all unique combinations of examined variables.

Related

Dealing with missing values in SAS?

I am handling a SAS dataset with little observations and I need to put on relation a variable with its lag.
By doing this, I lose a record resulting in a missing values.
Do some of you know how SAS Base handles such items in the procedures as PROC REG or PROC CORR?
Thanks you all in advance.
It depends a bit on what exactly you're doing. Based on your references to PROC REG and CORR - it excludes the missing values - listwise. This means if any value is missing in a row that is being used or referenced in the PROC it will be excluded.
http://documentation.sas.com/?docsetId=lrcon&docsetTarget=p175x77t7k6kggn1io94yedqagl3.htm&docsetVersion=9.4&locale=en
PROC CORR has several options that allow you to specify how it handles the missing values. The NOMISS option tells SAS to use only complete cases.
https://blogs.sas.com/content/iml/2012/01/13/missing-values-pairwise-corr.html

Missing values in a FREQ (SAS)

I'm going to ask this with an example...
Suppose i have a data set where each observation represents a person. Two of the variables are AGE and HASADOG (and say this has values 1 for yes and 2 for no.) Is there a way to run a PROC FREQ (by AGE*HASADOG) that forces SAS to include in the report a line for instances where the count is zero?
By this I mean: if there is a particular value for AGE such that no observation with this AGE value has a 1 in the HASADOG variable, the report will still include a row for this combination (with a row percent of 0.)
Is this possible?
The SPARSE option in PROC FREQ is likely all you need.
proc freq data=sashelp.class;
table sex*age / sparse list;
run;
If the value is nowhere in your data set at all, then there's no way for SAS to know it exists. In this case you'd need a more complex solution, basically a way to tell SAS all values you would be using ahead of time. This can be done via a PRELOADFMT or CLASSDATA option on several procs. There are asked an answered questions on this topic here on SO, so I won't provide a solution for this option, which seems beyond the scope of your question.

Comparing two datasets in sas

I have used proc compare to compare two datasets and have the difference details. But I just want to know whether two datasets are same or not(both content and number of rows wise). Like I have two datasets A and B. Want to just know whether they are same or not. No need of any other difference details. More like I just need to set a flag to 1, if the datasets are different or flag to zero if datasets are same. Is there a way to do it. I searched in internet, all I could see was using proc compare in various ways
Thanks in advance
you can use the sysinfo variable:
proc compare noprint base=baseds compare=compareds;
run;
%if %eval(&sysinfo ge 8) %then %do; ...
There is a great SAS paper describing the return codes in meticulous detail, available here.

Understanding SAS output data sets

SAS has several forms it uses to create output data sets from within a procedure. It is not always clear whether or not a particular procedure can generate a data set and, if it seems to be able to, it's not always clear how.
Off the top of my head, here are some examples of how widely the syntax can differ.
Example 1
proc sort data = sashelp.baseball out = baseball_sorted;
by
league
division
;
run;
Example 2
proc means noprint data = baseball_sorted;
by
league
division
;
var nHits;
output
out = baseball_avg_hits (drop = _TYPE_ _FREQ_)
mean = mean_hits
;
run;
Example 3
ods exclude all;
ods output
statistics = baseball_statistics
equality = baseball_ftest
;
proc ttest data = baseball_sorted;
class league;
var nHits;
run;
ods exclude none;
Example 4
The PROC ANOVA OUTSTAT= option.
It seems almost as if SAS has implemented each of these willy-nilly. Is the SAS syntax dictating how to create a data set directed by some consistent approach I am not seeing or is it truly capricious and arbitrary?
For PROC code, the syntax for outputting data is often specific to that procedure, which often feels willy-nilly. (Your examples 1, 2, 4) I think PROC developers are given a lot of freedom, and remember that many of these PROCS are 30+ years old.
The great thing about the Output Delivery System (ODS, your example 3) is it provides a single syntax for outputting data, regardless of the procedure. So you can use the ODS OUTPUT statement with (almost?) any PROC. The names and structures of the output objects will of course vary between PROCs. So if you are looking for a consistent approach, I would focus on using ODS OUTPUT. ODS was added in V7 (I think).
It would be interesting to try to find an example of an output dataset which could be made by a PROC but could not be made by ODS OUTPUT. I hope there aren't any. If that is the case, you could consider the range of OUTPUT statements/options within PROCs as legacy code.
Agree with Quentin. You have to remember that there are SAS systems out there running code written in the 80s. SAS would have a huge headache if they forced every team to rewrite all the procedures and then forced their customers to change all their code. SAS has been around since the 60s and the organic growth of the syntax is to be expected.
FWIW, having an OUT= statement makes sense on things with no graphical output. I.E. PROC SORT or PROC TRANSPOSE.
The way I see it there are four main ways to specify the output data sets.
In the PROC statement you may be able to specify some type of output statements or options, such as OUT= OUTEST=.
In the main statement of the procedure, ie MODEL/TABLE can have options that allow for output. ie PROC FREQ has an OUT= on the TABLE statement.
An explicit OUTPUT statement within a procedure. These are typically from older procedures. ie PROC MEANS
ODS tables which are relatively newer method, more frequently used these days since the format aligns with what you'd expect to see.
Yes, there are multiple places to check, but fortunately the SAS documentation for procedures is relatively clear with the options and how to use/specify the outputs.
If I've missed anything that seems different post in the comments and I can update this.
PS. Although SAS is definitely bad, trying to navigate different packages/modules in Python to export an XLSX file isn't straight forward either. Some packages support some options others don't. I've given up on asking why these days and just accept it as peculiarities of the different languages at this point.

Interpreting WHERE= in Proc

Converting a SAS code to Sql based process. Came across this pretty simple snippet.
proc freq data=temp1(where=(SAMERETAIL='Y')) noprint;
tables RETAILER*store/list nocum nopercent out=retailer_list;
run;
My interpretation of this is:
From Temp1:
Choose all observations which fit the criteria (sameretail=Y)
Extract Retail, Store frequency counts:
Store Retailer Count(*)
Output to Retailer_List.
The question I have is on the WHERE=. Is this applied to the Proc or Data? Is my interpretation correct? Business wise this is incorrect since we are only restricting the records with the flag=Y. Hence the question.
Any pointers?
Any help is greatly appreciated.
TIA.
where= is applied to the dataset you're pulling observations from to use in the proc freq. SUGI 24 has a good summary of how this works (see page 3).
THE WHERE DATA SET OPTION
The syntax of the WHERE data set option, called
WHERE=, is a combination of standard data set
option parentheses and a where-expression.