The following code, built using the Summary Statistics task from SAS Enterprise Guide, finds the min of each column of a table.
How can I find the second smallest value?
I tried replacing MIN with SMALLEST(2) but doesn't work.
Thank you.
TITLE;
TITLE1 "Summary Statistics";
TITLE2 "Results";
FOOTNOTE;
FOOTNOTE1 "Generated by the SAS System (&_SASSERVERNAME, &SYSSCPL) on
%TRIM (%QSYSFUNC(DATE(), NLDATE20.)) at
%TRIM(%SYSFUNC(TIME(), TIMEAMPM12.))";
PROC MEANS DATA=WORK.SORTTempTableSorted
NOPRINT
CHARTYPE
MIN NONOBS ;
VAR A B C;
OUTPUT OUT=WORK.MEANSummaryStats(LABEL="Summary Statistics for
WORK.QUERY_FOR_TRNSTRANSPOSEDPD__0001")
MIN()=
/ AUTONAME AUTOLABEL INHERIT
;
RUN;
Using the ExtremeValue table from PROC UNIVARIATE.
ods select none;
ods output ExtremeValues=ExtremeValues(where=(loworder=2) drop=high:);
proc univariate data=sashelp.class NEXTRVAL=2;
run;
ods select all;
proc print;
run;
I don't think there's any way to accomplish this within proc means. There are ways using a variety of other procs. The Univariate procedure highlights one method using the Extreme Observations.
https://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_univariate_sect058.htm
title 'Extreme Blood Pressure Observations';
ods select ExtremeObs;
proc univariate data=BPressure;
var Systolic Diastolic;
id PatientID;
run;
proc print data=ExtremeObs;
run;
I will assume you are interested in all numeric columns. If the IFLIST macro variable is longer than 64k bytes in length due to the number of numeric variables and the length of their names, this code will fail. It should work for all reasonably narrow data sets.
UNTESTED CODE
We get a list of the variables in your data set.
PROC CONTENTS DATA=SORTTempTableSorted OUT=md NOPRINT ;
RUN ;
We use that list to create statements and expressions.
IFLIST is a block of statements to store the minimum value of fieldname in fieldname_1 and the second lowest in fieldname_2. If the comparison is LT, then we keep distinct values, not necessarily the order statistics. If the comparision is LE and there are multiple observations with the minimum value, fieldname_1 and fieldname_2 will be equal to each other. I will assume you want distinct values.
MAXLIST is an expression that will resolve to the largest numeric value in the data set :)
MINLIST and MINLIST2 are created for use in RETAIN and KEEP statements.
PROC SQL STIMER NOPRINT EXEC ;
SELECT 'IF ' || name || ' LT ' || name '_1 THEN DO;' ||
name || '_2=' || name || '_1;' ||
name || '_1=' || name || ';END;ELSE IF ' ||
name || ' LT ' || name || '_2 THEN ' ||
name || '_2=' || name,
'MAX(' || name || ')',
name || '_1',
name || '_2'
INTO :iflist SEPARATED BY '; ',
:maxlist SEPARATED BY '<>'
:minlist SEPARATED BY ' ',
:min2list SEPARATED BY ' '
FROM md
WHERE type EQ 1
;
Now we get the largest numeric value from the data set:
SELECT &maxlist
INTO :maxval
FROM SORTTempTableSorted
;
QUIT ;
Now we do the work. The END option sets "eof" to 1 on the last observation, which is the only time we want to write a record to the output data set.
DATA min2 ;
SET SORTTempTableSorted END=eof;
RETAIN &minlist &min2list &maxval;
KEEP &minlist &min2list ;
&iflist ;
IF eof THEN
OUTPUT ;
RUN ;
A data step solution that should work for any number of columns without ever running into macro limitations:
proc sql noprint;
select count(*) into :NUM_COUNT from dictionary.columns
where LIBNAME='SASHELP' and MEMNAME = 'CLASS' and TYPE = 'num';
quit;
data class_min2;
do until(eof);
set sashelp.class end = eof;
array min2[&NUM_COUNT,2] _temporary_;
array nums[*] _numeric_;
do _n_ = 1 to &NUM_COUNT;
min2[_n_,1] = min(min2[_n_,1],nums[_n_]);
if min2[_n_,1] < nums[_n_] then min2[_n_,2] = min(nums[_n_],min2[_n_,2]);
end;
end;
do _iorc_ = 1 to 2;
do _n_ = 1 to &NUM_COUNT;
nums[_n_] = min2[_n_,_iorc_];
end;
output;
end;
keep _NUMERIC_;
run;
This outputs the two lowest distinct values of each numeric variable, without transposing the data in the same way that proc univariate does. You can easily allow for duplicate minimum values with a bit of tweaking.
Sort the values in ascending order. Delete the first value. This would be the minimum value. Now the value left at the first position is your second minimum.
Related
I have a very big SAS Dataset with over 280 variables in it and I need retrieve all the complete NULL columns based on a Variable value. For example I have a Variable called Reported(with only values Yes & No) in this dataset and I want to find out based on value No, all the complete Null Columns in this dataset.
Is there any quick way to find this out with out writing all the columns names for complete NULL values?
So for example if I have 4 Variables in the table,
So based on the above table I would like to see the output like this where Var4='No' and only return the columns with all the missing values
This would help me to identify variables which are not being populated at all where the Var4 value is 'No'
Note the WHERE statement in PROC FREQ.
proc format;
value $_xmiss_(default=1 min=1 max=1) ' ' =' ' other='1';
value _xmiss_(default=1 min=1 max=1) ._-.Z=' ' other='1';
quit;
%let data=sashelp.heart;
proc freq data=&data nlevels;
where status eq: 'A';
ods select nlevels;
ods output nlevels=nlevels;
format _character_ $_xmiss_. _numeric_ _xmiss_.;
run;
data nlevels;
length TABLEVAR $32 TABLEVARLABEL $128 NLEVELS NMISSLEVELS NNONMISSLEVELS 8;
retain NLEVELS NMISSLEVELS NNONMISSLEVELS 0;
set nlevels;
run;
There are 2 parts to the question I think. First is to subset records where Reported = "N". Then among those records, report columns that have all missing values. If this is correct then you could do something as follows (I am assuming that the columns with missing values are all numeric. If not, this approach will need a slight modification):
/* Thanks to REEZA for pointing out this way of getting the freqs. This eliminates some constraints and is more efficient */
proc freq data=have nlevels ;
where var1 = "N" ;
ods output nlevels = freqs;
table _all_;
run;
proc sql noprint;
select TableVar into :cols separated by " " from freqs where NNonMissLevels = 0 ;
quit;
%put &cols;
data want;
set have (keep = &cols var1);
where var1 = "N" ;
run;
Is there a general purpose way of concatenating each variable in an observation into one larger variable whilst preserving the format of numeric/currency fields in terms of how it looks when you do a proc print on the dataset. (see sashelp.shoes for example)
Here is some code you can run, as you can see when looking at the log, using the catx function to produce a comma separated output removes both the $ currency sign as well as the period from the numeric variables
proc print data=sashelp.shoes (obs=10);
run;
proc sql;
select name into :varstr2 separated by ','
from dictionary.columns
where libname = "SASHELP" and
memname = "SHOES";
quit;
data stuff();
format all $5000.;
set sashelp.shoes ;
all = catx(',',&varstr2.) ;
put all;
run;
Any solution needs to be general purpose as it will run on disparate datasets with differently formatted variables.
You can manually loop over PDV variables of the data set, concatenating each formatted value retrieved with vvaluex. A hash can be used to track which variables of the data set to process. If you are comma separating values you will probably want to double quote formatted values that contain a comma.
data want;
set sashelp.cars indsname=_data;
if _n_ = 1 then do;
declare hash vars();
length _varnum 8 _varname $32;
vars.defineKey('_n_');
vars.defineData('_varname');
vars.defineDone();
_dsid = open(_data);
do _n_ = 1 to attrn(_dsid,'NVAR');
rc = vars.add(key:_n_,data:varname(_dsid,_n_));
end;
_dsid = close(_dsid);
call missing (of _:);
end;
format weight comma7.;
length allcat $32000 _vvx $32000;
do _n_ = 1 to vars.NUM_ITEMS;
vars.find();
_vvx = strip(vvaluex(_varname));
if index(_vvx,",") then _vvx = quote(strip(_vvx));
if _n_ = 1
then allcat = _vvx;
else allcat = cats(allcat,',',_vvx);
end;
drop _:;
run;
You can use import and export to csv file:
filename tem temp;
proc export data=sashelp.SHOES file=tem dbms=csv replace;
run;
data l;
length all $ 200;
infile tem truncover firstobs=2;
input all 1-200;
run;
P.S.
If you need concatenate only char, uou can create array of all CHARACTER columns in dataset, and just iterate thru:
data l;
length all $ 5000;
set sashelp.SHOES;
array ch [*] _CHARACTER_;
do i = 1 to dim(ch);
all=catx(',',all,ch[i]);
end;
run;
The PUT statement is the easiest way to do that. You don't need to know the variables names as you can use the _all_ variable list.
put (_all_) (+0);
It will honor the formats attached the variables and if you have used DSD option on the FILE statement then the result is a delimited list.
What is the ultimate goal of this exercise? If you want to create a file you can just write the file directly.
data _null_;
set sashelp.shoes(obs=3);
file 'myfile.csv' dsd ;
put (_all_) (+0);
run;
If you really do want to get that string into a dataset variable there is no need to invent some new function. Just take advantage of the PUT statements abilities by creating a file and then reading the lines from the file.
filename junk temp;
data _null_;
set sashelp.shoes(obs=3);
file junk dsd ;
put (_all_) (+0);
run;
data stuff ;
set sashelp.shoes(obs=3);
infile junk truncover ;
input all $5000.;
run;
You can even do it without creating the full text file. Instead just write one line at a time and save the line into a variable using the _FILE_ automatic variable.
filename junk temp;
data stuff;
set sashelp.shoes(obs=3);
file junk dsd lrecl=5000 ;
length all $5000;
put #1 (_all_) (+0) +(-2) ' ' #;
all = _file_;
output;
all=' ';
put #1 all $5000. #;
run;
Solution with vvalue and concat function (||):
It is similar with 'solution without catx' (the last one), but it is simplified by vvalue function instead put.
/*edit sashelp.shoes with missing values in Product as test-cases*/
proc sql noprint;
create table wocatx as
select * from SASHELP.SHOES;
update wocatx
set Product = '';
quit;
/*Macro variable for concat function (||)*/
proc sql;
select ('strip(vvalue('|| strip(name) ||'))') into :varstr4 separated by "|| ',' ||"
from dictionary.columns
where libname = "WORK" and
memname = "WOCATX";
quit;
/*Data step to concat all variables*/
data stuff2;
format all $5000.;
set work.wocatx ;
all = &varstr4. ;
put all;
run;
Solution with catx:
proc print data=SASHELP.SHOES;
run;
proc sql;
select ifc(strip(format) is missing,strip(name),ifc(type='num','put('|| strip(name) ||','|| strip(format) ||')','input('|| strip(name) ||','|| strip(format) ||')')) into :varstr2 separated by ','
from dictionary.columns
where libname = "SASHELP" and
memname = "SHOES";
quit;
data stuff();
format all $5000.;
set sashelp.shoes ;
all = catx(',',&varstr2.) ;
put all;
run;
If there isn't in dictionary.columns format, then in macro variable varstr2 will just name, if there is format, then when it would call in catx it will convert in format, that you need, for example,if variable is num type then put(Sales,DOLLAR12.), or if it char type then input function . You could add any conditions in select into if you need.
If there is no need of using of input function just change select:
ifc(strip(format) is missing,strip(name),'put('|| strip(name) ||','|| strip(format) ||')')
Solution without catx:
/*edit sashelp.shoes with missing values in Product as test-cases*/
proc sql noprint;
create table wocatx as
select * from SASHELP.SHOES;
update wocatx
set Product = '';
quit;
/*Macro variable for catx*/
proc sql;
select ifc(strip(format) is missing,strip(name),ifc(type='num','put('|| strip(name) ||','|| strip(format) ||')','input('|| strip(name) ||','|| strip(format) ||')')) into :varstr2 separated by ','
from dictionary.columns
where libname = "WORK" and
memname = "WOCATX";
quit;
/*data step with catx*/
data stuff;
format all $5000.;
set work.wocatx ;
all = catx(',',&varstr2.) ;
put all;
run;
/*Macro variable for concat function (||)*/
proc sql;
select ifc(strip(format) is missing,
'strip(' || strip(name) || ')',
'strip(put('|| strip(name) ||','|| strip(format) ||'))') into :varstr3 separated by "|| ',' ||"
from dictionary.columns
where libname = "WORK" and
memname = "WOCATX";
quit;
/*Data step without catx*/
data stuff1;
format all $5000.;
set work.wocatx ;
all = &varstr3. ;
put all;
run;
Result with catx and missing values:
Result without catx and with missing values:
I want to achieve the same output but instead of harcoding each of the array-element use something like var1 - var10 but that would jump by 10 like decades.
data work.test(keep= statename pop_diff:);
set sashelp.us_data(keep=STATENAME POPULATION:);
array population_array {*} POPULATION_1910 -- POPULATION_2010;
dimp = dim(population_array);
/* here and below something like:
array pop_diff_amount {10} pop_diff_amount_1920 -- pop_diff_amount_2010;*/
array pop_diff_amount {10} pop_diff_amount_1920 pop_diff_amount_1930
pop_diff_amount_1940 pop_diff_amount_1950
pop_diff_amount_1960 pop_diff_amount_1970
pop_diff_amount_1980 pop_diff_amount_1990
pop_diff_amount_2000 pop_diff_amount_2010;
array pop_diff_prcnt {10} pop_diff_prcnt_1920 pop_diff_prcnt_1930
pop_diff_prcnt_1940 pop_diff_prcnt_1950
pop_diff_prcnt_1960 pop_diff_prcnt_1970
pop_diff_prcnt_1980 pop_diff_prcnt_1990
pop_diff_prcnt_2000 pop_diff_prcnt_2010;
do i=1 to dim(population_array) - 1;
pop_diff_amount{i} = population_array{i+1} - population_array{i};
pop_diff_prcnt{i} = (population_array{i+1} / population_array{i} -1) * 100;
end;
RUN;
I am still beginner in it therefore I am not sure is this possible or easy to achieve.
Thanks!
Not automatic but not all that difficult either. First create a data set of the names then transpose and use an unexecuted set to bring in the names and then define arrays. Note how arrays are define using [*] and name: as you did with population_array.
data names;
do type = 'Amount','Prcnt';
do year=1920 to 2010 by 10;
length _name_ $32;
_name_ = catx('_','pop_diff',type,year);
output;
end;
end;
run;
proc print;
run;
proc transpose data=names out=pop_diff(drop=_name_);
var;
run;
proc contents varnum;
run;
data pop;
set sashelp.us_data(keep=STATENAME POPULATION:);
array population_array {*} POPULATION_1910 -- POPULATION_2010;
if 0 then set pop_diff;
array pop_diff_amount[*] pop_diff_amount:;
array pop_diff_prcnt[*] pop_diff_prcnt:;
do i=1 to dim(population_array) - 1;
pop_diff_amount{i} = population_array{i+1} - population_array{i};
pop_diff_prcnt{i} = (population_array{i+1} / population_array{i} -1) * 100;
end;
run;
proc print data=pop;
run;
SAS is automatically going to increment the array elements by 1. Here is an alternative solution that creates the variables using one extra step to create a set of macro variables that hold the desired variable names. Since you are basing them off of the variable POPULATION_<year>, we will simply grab the years from those variable names, create the variable names for the arrays that we want, and store them into a few macro variables.
proc sql noprint;
select cats('pop_diff_amount_', scan(name, -1, '_') )
, cats('pop_diff_prcnt_', scan(name, -1, '_') )
into :pop_diff_amount_vars separated by ' '
, :pop_diff_prcnt_vars separated by ' '
from dictionary.columns
where libname = 'SASHELP'
AND memname = 'US_DATA'
AND upcase(name) LIKE 'POPULATION_%'
;
quit;
data work.test(keep= statename pop_diff:);
set sashelp.us_data(keep=STATENAME POPULATION:);
array population_array {*} POPULATION_1910 -- POPULATION_2010;
dimp = dim(population_array);
array pop_diff_amount {*} &pop_diff_amount_vars.;
array pop_diff_prcnt {*} &pop_diff_prcnt_vars.;
do i=1 to dim(population_array) - 1;
pop_diff_amount{i} = population_array{i+1} - population_array{i};
pop_diff_prcnt{i} = (population_array{i+1} / population_array{i} -1) * 100;
end;
RUN;
Getting the data out of the meta data (create variable year) would make coding life easier.
proc transpose data=sashelp.us_data out=us_pop(rename=(col1=Population));
by statename;
var population_:;
run;
data us_pop;
set us_pop;
by statename;
year = input(scan(_name_,-1,'_'),4.);
pop_diff_amount=dif(population);
pop_diff_prcnt =(population/lag(population))-1;
format pop_diff_prcnt percent10.2;
if first.statename then call missing(of pop_diff_amount pop_diff_prcnt);
drop _:;
run;
proc print data=us_pop(obs=10);
run;
I have a dataset with some variables named sx for x = 1 to n.
Is it possible to write a freq which gives the same result as:
proc freq data=prova;
table s1 * s2 * s3 * ... * sn /list missing;
run;
but without listing all the names of the variables?
I would like an output like this:
S1 S2 S3 S4 Frequency
A 10
A E 100
A E J F 300
B 10
B E 100
B E J F 300
but with an istruction like this (which, of course, is invented):
proc freq data=prova;
table s1:sn /list missing;
run;
Why not just use PROC SUMMARY instead?
Here is an example using two variables from SASHELP.CARS.
So this is PROC FREQ code.
proc freq data=sashelp.cars;
where make in: ('A','B');
tables make*type / list;
run;
Here is way to get counts using PROC SUMMARY
proc summary missing nway data=sashelp.cars ;
where make in: ('A','B');
class make type ;
output out=want;
run;
proc print data=want ;
run;
If you need to calculate the percentages you can instead use the WAYS statement to get both the overall and the individual cell counts. And then add a data step to calculate the percentages.
proc summary missing data=sashelp.cars ;
where make in: ('A','B');
class make type ;
ways 0 2 ;
output out=want;
run;
data want ;
set want ;
retain total;
if _type_=0 then total=_freq_;
percent=100*_freq_/total;
run;
So if you have 10 variables you would use
ways 0 10 ;
class s1-s10 ;
If you just want to build up the string "S1*S2*..." then you could use a DO loop or a macro %DO loop and put the result into a macro variable.
data _null_;
length namelist $200;
do i=1 to 10;
namelist=catx('*',namelist,cats('S',i));
end;
call symputx('namelist',namelist);
run;
But here is an easy way to make such a macro variable from ANY variable list not just those with numeric suffixes.
First get the variables names into a dataset. PROC TRANSPOSE is a good way if you use the OBS=0 dataset option so that you only get the _NAME_ column.
proc transpose data=have(obs=0) ;
var s1-s10 ;
run;
Then use PROC SQL to stuff the names into a macro variable.
proc sql noprint;
select _name_
into :namelist separated by '*'
from &syslast
;
quit;
Then you can use the macro variable in your TABLES statement.
proc freq data=have ;
tables &namelist / list missing ;
run;
Car':
In short, no. There is no shortcut syntax for specifying a variable list that crosses dimension.
In long, yes -- if you create a surrogate variable that is an equivalent crossing.
Discussion
Sample data generator:
%macro have(top=5);
%local index;
data have;
%do index = 1 %to ⊤
do s&index = 1 to 2+ceil(3*ranuni(123));
%end;
array V s:;
do _n_ = 1 to 5*ranuni(123);
x = ceil(100*ranuni(123));
if ranuni(123) < 0.1 then do;
ix = ceil(&top*ranuni(123));
h = V(ix);
V(ix) = .;
output;
V(ix) = h;
end;
else
output;
end;
%do index = 1 %to ⊤
end;
%end;
run;
%mend;
%have;
As you probably noticed table s: created one freq per s* variable.
For example:
title "One table per variable";
proc freq data=have;
tables s: / list missing ;
run;
There is no shortcut syntax for specifying a variable list that crosses dimension.
NOTE: If you specify out=, the column names in the output data set will be the last variable in the level. So for above, the out= table will have a column "s5", but contain counts corresponding to combinations for each s1 through s5.
At each dimensional level you can use a variable list, as in level1 * (sublev:) * leaf. The same caveat for out= data applies.
Now, reconsider the original request discretely (no-shortcut) crossing all the s* variables:
title "1 table - 5 columns of crossings";
proc freq data=have;
tables s1*s2*s3*s4*s5 / list missing out=outEach;
run;
And, compare to what happens when a data step view uses a variable list to compute a surrogate value corresponding to the discrete combinations reported above.
data haveV / view=haveV;
set have;
crossing = catx(' * ', of s:); * concatenation of all the s variables;
keep crossing;
run;
title "1 table - 1 column of concatenated crossings";
proc freq data=haveV;
tables crossing / list missing out=outCat;
run;
Reality check with COMPARE, I don't trust eyeballs. If zero rows with differences (per noequal) then the out= data sets have identical counts.
proc compare noprint base=outEach compare=outCat out=diffs outnoequal;
var count;
run;
----- Log -----
NOTE: There were 31 observations read from the data set WORK.OUTEACH.
NOTE: There were 31 observations read from the data set WORK.OUTCAT.
NOTE: The data set WORK.DIFFS has 0 observations and 3 variables.
NOTE: PROCEDURE COMPARE used (Total process time)
Goal: Add one or more empty rows after a specific position in the dataset
Here's the work I've done so far:
data test_data;
set sashelp.class;
output;
/* Add a blank line after row 5*/
if _n_ = 5 then do;
call missing(of _all_);
output;
end;
/* Add 4 blank rows after row 7*/
if _n_ = 7 then do;
/* inserts a blank row after row 8 (7 original rows + 1 blank row) */
call missing(of _all_);
/*repeats the newly created blank row: inserts 3 blank rows*/
do i = 1 to 3;
output;
end;
end;
run;
I'm still learning how to use SAS, but I "feel" like there's a better way to get to the same result, chiefly not having to use a for loop to insert multiple empty rows. Doing this several times, I lose track of which values n should be as well. I was wondering:
Is there a better way to do this for rows?
Is there an equivalent for columns? (A similar question probably)
These rows/columns are being added more to fit a report format. The dataset doesn't need these blank rows/columns for its own sake. Is some PROC or REPORT function that achieves the same thing?
Do you just want to add blanks to your report after each group of observations?
For example lets make a dummy dataset with a group variable.
data test_data ;
set sashelp.class ;
if _n_ <= 5 then group=1;
else if _n_ <=7 then group=2 ;
else group=3 ;
run;
Now we can make a report that adds two blank lines after each group.
proc report data=test_data ;
column group name age ;
define group / group noprint ;
define name / display ;
define age / display;
compute after group ;
line #1 ' ';
line #1 ' ';
endcomp;
run;
Or is it the case that your report data does not have all possible values and you want your output to have all possible values?
So we wanted a report that counts how many kids in our class by age and we need to report for ages 10 to 18. But there are no 10 year olds in our class. Make a dummy table and merge with that.
proc freq data=sashelp.class ;
tables age / noprint out=counts;
run;
proc print data=counts;
run;
data shell;
do age=10 to 18;
retain count 0 percent 0;
output;
end;
run;
data want ;
merge shell counts ;
by age;
run;
proc print data=want ;
run;