How do I calculate range of a variable in SAS? - sas

I have a table in SAS which has a variable say X.
I want to know only the range of X, I used PROC UNIVARIATE, but it gives out a lot of other information.
I have been trying to use RANGE function in the following way, but doesn't yield any result. Please help!
DATA DATASET2;
SET DATASET1;
R=RANGE(X);
KEEP R;
RUN;
PROC PRINT DATASET2;
RUN;

the range function is for within a row and you have tried for column, so probably you might have got zeros.
range function can be used as follows.
R= range(x,y,x);
For within an column you need use proc means.
proc means data=sashelp.class range maxdec=2;
var age;
run;
or by using proc sql as shown below.
proc sql;
select max(age) -min(age) as range
from sashelp.class;

You can also use the range function in proc sql, where it operates on columns rather than rows:
proc sql;
select range(age) from sashelp.class;
quit;
This is also possible within a data step, if you don't like sql:
data _null_;
set sashelp.class end = eof;
retain min_age max_age;
min_age = min(age,min_age);
max_age = max(age,max_age);
if eof then do;
range = max_age - min_age;
put range= min_age= max_age=;
end;
run;
Or equivalently:
data _null_;
do until (eof);
set sashelp.class end = eof;
min_age = min(age,min_age);
max_age = max(age,max_age);
end;
range = max_age - min_age;
put range= min_age= max_age=;
run;

Related

Drop variables with all-zero values from a SAS data set

I often work with a large number of variables that have zero or empty values only, but I could not find a SAS command to drop these unwanted variables. I know we can use SAS/IML, but I encountered such cases many times and would like to have a macro that may help me without having to type the variable names to avoid errors. Here is my code for removing variables with zero values only. It works to produce a cleaned output data set y from a raw data set x without using the names of the variables. I hope others could have a better solution or help me to make mine better.
%Macro dropZeroV(x, y) ;
proc means data = &x. ;
var _numeric_;
output out = sumTab ; run;
proc transpose data = sumTab(drop = _TYPE_) out= sumt; var _Numeric_; id _STAT_; run;
%let Vlst =;
proc sql noprint;
select _NAME_ into : dropLst separated by ' '
from sumT
where Max=0 and Min =0;
data &y.;
set &x.; drop &dropLst.;
run;
proc print data = &y.; run;
%Mend dropZeroV;
Use STACKODS and ODS SUMMARY to get the table in the format needed in one step rather than multiple steps. This limits it to the sum, since if the sum = 0, all values are 0. You may also want to look at rounding to avoid any issues with numeric precision.
PROC MEANS + PROC TRANSPOSE go to :
ods select none;
proc means data= &x. stackods sum;
var _numeric_;
ods output summary = sumT;
run;

Using output of Proc Freq variables for input into Proc Format

I want to grab the number of variable levels as well as the variable for the output of the unique identifiers but currently my method does not work. I want to then use the unique ID's and associate numbers 1-num_levels from proc freq.
Here is what I have for proc freq:
PROC FREQ DATA=my_data (keep=IDs) nlevels;
table all/out=out_data;
%let dim=levels;
%let IDs;
run;
Then I tried to use the macro variables but it didn't work so I am including the manual version of my proc format to give a good idea of what I am trying to achieve but hopefully trying to make it more automated.
PROC FORMAT;
INVALUE INDEX
"1234" = 1
"2345" = 2
.
.
.
"8901" =25;
/*25 represents the output of the levels
variable from proc freq but I couldn't figure out how to grab that either*/
RUN;
Any help would be appreciated.
Thank you!
Here's a fully worked solution, that illustrates a the PROC FORMAT CNTLIN way of doing this. The idea here is to mask names with the observation number instead.
*Create list of unique names;
proc freq data=sashelp.class noprint;
table name/out = mask;
run;
*create control data set. Variables that need to be set are:
fmtname, start, label and type;
data name_fmt;
set mask;
fmtname = 'namefmt';
type='J';
*J specified character informat, C would be character format;
start=name;
label = put(_n_, z2.); *Use the row number as the recoded value;
run;
*create the format;
proc format cntlin=name_fmt;
run;
*Sample usage;
data class;
set sashelp.class;
name_masked = input(name, $namefmt.);
drop name;
run;

Create new variables from format values

What i want to do: I need to create a new variables for each value labels of a variable and do some recoding. I have all the value labels output from a SPSS file (see sample).
Sample:
proc format; library = library ;
value SEXF
1 = 'Homme'
2 = 'Femme' ;
value FUMERT1F
0 = 'Non'
1 = 'Oui , occasionnellement'
2 = 'Oui , régulièrement'
3 = 'Non mais j''ai déjà fumé' ;
value ... (many more with different amount of levels)
The new variable name would be the actual one without F and with underscore+level (example: FUMERT1F level 0 would become FUMERT1_0).
After that i need to recode the variables on this pattern:
data ds; set ds;
FUMERT1_0=0;
if FUMERT1=0 then FUMERT1_0=1;
FUMERT1_1=0;
if FUMERT1=1 then FUMERT1_1=1;
FUMERT1_2=0;
if FUMERT1=2 then FUMERT1_2=1;
FUMERT1_3=0;
if FUMERT1=3 then FUMERT1_3=1;
run;
Any help will be appreciated :)
EDIT: Both answers from Joe and the one of data_null_ are working but stackoverflow won't let me pin more than one right answer.
Update to add an _ underscore to the end of each name. It looks like there is not option for PROC TRANSREG to put an underscore between the variable name and the value of the class variable so we can just do a temporary rename. Create rename name=newname pairs to rename class variable to end in underscore and to rename them back. CAT functions and SQL into macro variables.
data have;
call streaminit(1234);
do caseID = 1 to 1e4;
fumert1 = rand('table',.2,.2,.2) - 1;
sex = first(substrn('MF',rand('table',.5),1));
output;
end;
stop;
run;
%let class=sex fumert1;
proc transpose data=have(obs=0) out=vnames;
var &class;
run;
proc print;
run;
proc sql noprint;
select catx('=',_name_,cats(_name_,'_')), catx('=',cats(_name_,'_'),_name_), cats(_name_,'_')
into :rename1 separated by ' ', :rename2 separated by ' ', :class2 separated by ' '
from vnames;
quit;
%put NOTE: &=rename1;
%put NOTE: &=rename2;
%put NOTE: &=class2;
proc transreg data=have(rename=(&rename1));
model class(&class2 / zero=none);
id caseid;
output out=design(drop=_: inter: rename=(&rename2)) design;
run;
%put NOTE: _TRGIND(&_trgindn)=&_trgind;
First try:
Looking at the code you supplied and the output from Joe's I don't really understand the need for the formats. It looks to me like you just want to create dummies for a list of class variables. That can be done with TRANSREG.
data have;
call streaminit(1234);
do caseID = 1 to 1e4;
fumert1 = rand('table',.2,.2,.2) - 1;
sex = first(substrn('MF',rand('table',.5),1));
output;
end;
stop;
run;
proc transreg data=have;
model class(sex fumert1 / zero=none);
id caseid;
output out=design(drop=_: inter:) design;
run;
proc contents;
run;
proc print data=design(obs=40);
run;
One good alternative to your code is to use proc transpose. It won't get you 0's in the non-1 cells, but those are easy enough to get. It does have the disadvantage that it makes it harder to get your variables in a particular order.
Basically, transpose once to vertical, then transpose back using the old variable name concatenated to the variable value as the new variable name. Hat tip to Data null for showing this feature in a recent SAS-L post. If your version of SAS doesn't support concatenation in PROC TRANSPOSE, do it in the data step beforehand.
I show using PROC EXPAND to then set the missings to 0, but you can do this in a data step as well if you don't have ETS or if PROC EXPAND is too slow. There are other ways to do this - including setting up the dataset with 0s pre-proc-transpose - and if you have a complicated scenario where that would be needed, this might make a good separate question.
data have;
do caseID = 1 to 1e4;
fumert1 = rand('Binomial',.3,3);
sex = rand('Binomial',.5,1)+1;
output;
end;
run;
proc transpose data=have out=want_pre;
by caseID;
var fumert1 sex;
copy fumert1 sex;
run;
data want_pre_t;
set want_pre;
x=1; *dummy variable;
run;
proc transpose data=want_pre_t out=want delim=_;
by caseID;
var x;
id _name_ col1;
copy fumert1 sex;
run;
proc expand data=want out=want_e method=none;
convert _numeric_ /transformin=(setmiss 0);
run;
For this method, you need to use two concepts: the cntlout dataset from proc format, and code generation. This method will likely be faster than the other option I presented (as it passes through the data only once), but it does rely on the variable name <-> format relationship being straightforward. If it's not, a slightly more complex variation will be required; you should post to that effect, and this can be modified.
First, the cntlout option in proc format makes a dataset of the contents of the format catalog. This is not the only way to do this, but it's a very easy one. Specify the appropriate libname as you would when you create a format, but instead of making one, it will dump the dataset out, and you can use it for other purposes.
Second, we create a macro that performs your action one time (creating a variable with the name_value name and then assigning it to the appropriate value) and then use proc sql to make a bunch of calls to that macro, once for each row in your cntlout dataset. Note - you may need a where clause here, or some other modifications, if your format library includes formats for variables that aren't in your dataset - or if it doesn't have the nice neat relationship your example does. Then we just make those calls in a data step.
*Set up formats and dataset;
proc format;
value SEXF
1 = 'Homme'
2 = 'Femme' ;
value FUMERT1F
0 = 'Non'
1 = 'Oui , occasionnellement'
2 = 'Oui , régulièrement'
3 = 'Non mais j''ai déjà fumé' ;
quit;
data have;
do caseID = 1 to 1e4;
fumert1 = rand('Binomial',.3,3);
sex = rand('Binomial',.5,1)+1;
output;
end;
run;
*Dump formats into table;
proc format cntlout=formats;
quit;
*Macro that does the above assignment once;
%macro spread_var(var=, val=);
&var._&val.= (&var.=&val.); *result of boolean expression is 1 or 0 (T=1 F=0);
%mend spread_var;
*make the list. May want NOPRINT option here as it will make a lot of calls in your output window otherwise, but I like to see them as output.;
proc sql;
select cats('%spread_var(var=',substr(fmtname,1,length(Fmtname)-1),',val=',start,')')
into :spreadlist separated by ' '
from formats;
quit;
*Actually use the macro call list generated above;
data want;
set have;
&spreadlist.;
run;

SAS loop through datasets

I have multiple tables in a library call snap1:
cust1, cust2, cust3, etc
I want to generate a loop that gets the records' count of the same column in each of these tables and then insert the results into a different table.
My desired output is:
Table Count
cust1 5,000
cust2 5,555
cust3 6,000
I'm trying this but its not working:
%macro sqlloop(data, byvar);
proc sql noprint;
select &byvar.into:_values SEPARATED by '_'
from %data.;
quit;
data_&values.;
set &data;
select (%byvar);
%do i=1 %to %sysfunc(count(_&_values.,_));
%let var = %sysfunc(scan(_&_values.,&i.));
output &var.;
%end;
end;
run;
%mend;
%sqlloop(data=libsnap, byvar=membername);
First off, if you just want the number of observations, you can get that trivially from dictionary.tables or sashelp.vtable without any loops.
proc sql;
select memname, nlobs
from dictionary.tables
where libname='SNAP1';
quit;
This is fine to retrieve number of rows if you haven't done anything that would cause the number of logical observations to differ - usually a delete in proc sql.
Second, if you're interested in the number of valid responses, there are easier non-loopy ways too.
For example, given whatever query that you can write determining your table names, we can just put them all in a set statement and count in a simple data step.
%let varname=mycol; *the column you are counting;
%let libname=snap1;
proc sql;
select cats("&libname..",memname)
into :tables separated by ' '
from dictionary.tables
where libname=upcase("&libname.");
quit;
data counts;
set &tables. indsname=ds_name end=eof; *9.3 or later;
retain count dataset_name;
if _n_=1 then count=0;
if ds_name ne lag(ds_name) and _n_ ne 1 then do;
output;
count=0;
end;
dataset_name=ds_name;
count = count + ifn(&varname.,1,1,0); *true, false, missing; *false is 0 only;
if eof then output;
keep count dataset_name;
run;
Macros are rarely needed for this sort of thing, and macro loops like you're writing even less so.
If you did want to write a macro, the easier way to do it is:
Write code to do it once, for one dataset
Wrap that in a macro that takes a parameter (dataset name)
Create macro calls for that macro as needed
That way you don't have to deal with %scan and troubleshooting macro code that's hard to debug. You write something that works once, then just call it several times.
proc sql;
select cats('%mymacro(name=',"&libname..",memname,')')
into :macrocalls separated by ' '
from dictionary.tables
where libname=upcase("&libname.");
quit;
&macrocalls.;
Assuming you have a macro, %mymacro, which does whatever counting you want for one dataset.
* Updated *
In the future, please post the log so we can see what is specifically not working. I can see some issues in your code, particularly where your macro variables are being declared, and a select statement that is not doing anything. Here is an alternative process to achieve your goal:
Step 1: Read all of the customer datasets in the snap1 library into a macro variable:
proc sql noprint;
select memname
into :total_cust separated by ' '
from sashelp.vmember
where upcase(memname) LIKE 'CUST%'
AND upcase(libname) = 'SNAP1';
quit;
Step 2: Count the total number of obs in each data set, output to permanent table:
%macro count_obs;
%do i = 1 %to %sysfunc(countw(&total_cust) );
%let dsname = %scan(&total_cust, &i);
%let dsid=%sysfunc(open(&dsname) );
%let nobs=%sysfunc(attrn(&dsid,nobs) );
%let rc=%sysfunc(close(&dsid) );
data _total_obs;
length Member_Name $15.;
Member_Name = "&dsname";
Total_Obs = &nobs;
format Total_Obs comma8.;
run;
proc append base=Total_Obs
data=_total_obs;
run;
%end;
proc datasets lib=work nolist;
delete _total_obs;
quit;
%mend;
%count_obs;
You will need to delete the permanent table Total_Obs if it already exists, but you can add code to handle that if you wish.
If you want to get the total number of non-missing observations for a particular column, do the same code as above, but delete the 3 %let statements below %let dsname = and replace the data step with:
data _total_obs;
length Member_Name $7.;
set snap1.&dsname end=eof;
retain Member_Name "&dsname";
if(NOT missing(var) ) then Total_Obs+1;
if(eof);
format Total_Obs comma8.;
run;
(Update: Fixed %do loop in step 2)

SAS: PROC IMPORT: CSV WITH DATES AS VAR NAMES

I'm importing CSV data in the following format:
SEDOL,12/08/2009,13/08/2009,14/08/2009,17/08/2009,18/08/2009
B1YVN39,7.8431,7.8431,7.8431,7.8431,7.598
B00G7R3,3.8,3.61,3.81,3.81,3.81
2965237,4.5351,4.5351,4.5351,4.5351,4.5351
2554345,7.355,7.355,7.355,7.355,7.355
I'm using the following command:
PROC IMPORT OUT= want
DATAFILE= have
DBMS=CSV REPLACE;
RUN;
Then transposing the data to long format, as follows:
PROC SORT DATA=want OUT=want; BY SEDOL;RUN;
proc transpose data=want out=transp;
by SEDOL;
run;
proc print; run;
How can I import the dates correctly formatted and change the variable type from default to date?
Importing and transposing are handy procedures, but if you understand your data well, a little data step program can deal with this in one step:
data want(keep=sedol v_date v_value);
infile have dsd dlm=',' truncover;
informat sedol $8. d1-d50 ddmmyy10. v1-v50 8.;
format v_date yymmdd10.;
array d(50) d1-d50;
array v(50) v1-v50;
/* Retain the date values and the count of dates */
retain d1-d50 idx;
/* Read header */
if _n_ = 1 then do;
input sedol d1-d50;
/* loop to find how many date columns there are */
do idx=1 to 50 while(d(idx) ne .);
end;
idx = idx - 1; /* must subtract one here */
delete;
end;
/* Read data lines */
input sedol v1-v50;
do i=1 to idx;
v_date = d(i);
v_value = v(i);
output;
end;
run;
As long as your input file is exactly as you describe (a header record with a leading ID variable less than 8 characters followed by some number of date values representing columns), this will process up to 50 measurements. It should be easy enough to modify if your needs change.
I would suggest in this case importing separately data and headers.
First, we import data:
PROC IMPORT OUT= want
DATAFILE= "C:\have.csv"
DBMS=CSV REPLACE;
getnames=no;
datarow=2;
RUN;
Then we import only the first row with variables' names:
options obs=1;
PROC IMPORT OUT= header
DATAFILE= "C:\have.csv"
DBMS=CSV REPLACE;
getnames=no;
RUN;
options obs=max;
Then we transpose row with headers into column and "mask" illegal (as SAS-names) values - add letter (doesn't matter which one, I chose 'D') as the first character and replace all slashes '/' to underscores '_':
proc transpose data=header out=header(drop=_name_);var _all_;run;
data header;
set header;
if anydigit(substr(COL1,1,1)) then COL1=cats("D",COL1);
COL1=translate(COL1,"_","/");
run;
Put this new 'cleaned' column names into a macrovariable:
proc sql noprint;
select COL1 into :names separated by ' '
from header;
quit;
And generate DATA-step for renaming using CALL EXECUTE routine:
data _null_;
dsid=open("want","i");
num=attrn(dsid,"nvars");
call execute("data want;");
call execute("set want;");
call execute("rename");
do i=1 to num;
call execute(varname(dsid,i)||"="||scan("&names",i," "));
end;
call execute(";run;");
rc=close(dsid);
run;
Now your original SORT and TRANSPOSE:
PROC SORT DATA=want OUT=want; BY SEDOL;RUN;
proc transpose data=want out=transp;
by SEDOL;
run;
And at last 'unmask' those dates back (deleting first D and replacing _ to /), and covert them to real dates with INPUT(). RETAIN statement is added just to put the new variable DATE at the second place right after SEDOl.
data transp;
retain SEDOL date;
set transp;
substr(_name_,1,1)='';
_name_=translate(_name_,"/","_");
date=input(strip(_name_),ddmmyy10.);
drop _name_;
format date ddmmyy10.;
run;