Interpreting WHERE= in Proc - sas

Converting a SAS code to Sql based process. Came across this pretty simple snippet.
proc freq data=temp1(where=(SAMERETAIL='Y')) noprint;
tables RETAILER*store/list nocum nopercent out=retailer_list;
run;
My interpretation of this is:
From Temp1:
Choose all observations which fit the criteria (sameretail=Y)
Extract Retail, Store frequency counts:
Store Retailer Count(*)
Output to Retailer_List.
The question I have is on the WHERE=. Is this applied to the Proc or Data? Is my interpretation correct? Business wise this is incorrect since we are only restricting the records with the flag=Y. Hence the question.
Any pointers?
Any help is greatly appreciated.
TIA.

where= is applied to the dataset you're pulling observations from to use in the proc freq. SUGI 24 has a good summary of how this works (see page 3).
THE WHERE DATA SET OPTION
The syntax of the WHERE data set option, called
WHERE=, is a combination of standard data set
option parentheses and a where-expression.

Related

Unique combinations

I'm looking to examine unique combinations of 7 binary variables (cannabis modes of delivery [yes/no]) and was under the impression this should be a fairly simple task in SAS. However, all of the coding examples I've come across online seem a bit overcomplicated for such a basic process. If anyone has insight regarding this concept I would appreciate it!
So if you want the counts probably it easiest to use PROC SUMMARY.
proc summary data=have nway missing ;
class var1-var7 ;
output out=want(rename=(_freq_=count));
run;
proc sort data=yourdataset(Keep=Variables to examine seperated by blanks) out=choose_a_name noduprec;
by Variables to examine seperated by blanks;
run;
Resulting dataset choose_a_name will contain all unique combinations of examined variables.

Dealing with missing values in SAS?

I am handling a SAS dataset with little observations and I need to put on relation a variable with its lag.
By doing this, I lose a record resulting in a missing values.
Do some of you know how SAS Base handles such items in the procedures as PROC REG or PROC CORR?
Thanks you all in advance.
It depends a bit on what exactly you're doing. Based on your references to PROC REG and CORR - it excludes the missing values - listwise. This means if any value is missing in a row that is being used or referenced in the PROC it will be excluded.
http://documentation.sas.com/?docsetId=lrcon&docsetTarget=p175x77t7k6kggn1io94yedqagl3.htm&docsetVersion=9.4&locale=en
PROC CORR has several options that allow you to specify how it handles the missing values. The NOMISS option tells SAS to use only complete cases.
https://blogs.sas.com/content/iml/2012/01/13/missing-values-pairwise-corr.html

Missing values in a FREQ (SAS)

I'm going to ask this with an example...
Suppose i have a data set where each observation represents a person. Two of the variables are AGE and HASADOG (and say this has values 1 for yes and 2 for no.) Is there a way to run a PROC FREQ (by AGE*HASADOG) that forces SAS to include in the report a line for instances where the count is zero?
By this I mean: if there is a particular value for AGE such that no observation with this AGE value has a 1 in the HASADOG variable, the report will still include a row for this combination (with a row percent of 0.)
Is this possible?
The SPARSE option in PROC FREQ is likely all you need.
proc freq data=sashelp.class;
table sex*age / sparse list;
run;
If the value is nowhere in your data set at all, then there's no way for SAS to know it exists. In this case you'd need a more complex solution, basically a way to tell SAS all values you would be using ahead of time. This can be done via a PRELOADFMT or CLASSDATA option on several procs. There are asked an answered questions on this topic here on SO, so I won't provide a solution for this option, which seems beyond the scope of your question.

error message box cox in proc transreg procedure in sas

I try to use the proc transreg procedure in SAS, to transform one of my variables in a dataset (var1). The var1 variable has values >=0.
My code is:
proc transreg data=data1 details;
model boxcox(var1/lambda=-1 to 1 by 0.125 convenient parameter=1)=identity(var2);
output out=BoxCox_Out;
run;
However I get the following error message:
"observation of nonblank TYPE not equal 'Score ' are excluded from the analysis and the output data set.
Could anyone help me?
_TYPE_ can be used for TRANSREG to allow you to take datasets with multiple kinds of rows and only use the SCORE rows (or whichever ones you choose), often outputs from earlier TRANSREG procedures.
However, _TYPE_ is also a common variable added by procedures like PROC MEANS to indicate which class combinations apply to the row. In this case, TRANSREG is getting confused and thinking you want something different.
Drop the _TYPE_ variable in the TRANSREG data source statement, and it should use all rows.
proc transreg data=data1(drop=_type_) details;

Outputting the top 10% in SAS

How can I limit the number of observations that are output in SAS to the top 10%? I know about the obs= function, but I haven't been able to figure out how to make the obs= lead to a percentage.
Thanks in advance!
As far as I know there's not a direct way to do what you're asking. You could pretty easily write a macro to do this, though.
Assuming you're asking to view the first 10% of records in a PROC PRINT, you could do something like this.
%macro top10pct(lib=WORK,dataset=);
proc sql noprint;
select max(ceil(0.1*nlobs)) into :_nobs
from dictionary.tables
where upcase(libname)=upcase("&lib.") and upcase(memname)=upcase("&dataset.");
quit;
proc print data=&lib..&dataset.(obs=&_nobs.);
run;
%mend top10pct;
dictionary.tables has all of the PROC CONTENTS information available to it, including the number of logical observations (NLOBS). This number is not 100% guaranteed to be accurate if you've been doing things to the dataset like deleting observations in SQL, but for SAS datasets this is almost always accurate, or close enough. For RDBMS tables this may be undefined or may not be accurate.