Alternate of Proc SQL case statements in SAS - sas

I have been able to do the desired with the following code.
But i have a large data set and i want to do the same using SAS Data step code and Not Proc SQL.
Following is the code:
`proc sql;
create table RTA_NDP_Red_2 as
select TRFFIC_NO as TRAFFIC_NO,
sum( case when ticket_date_v1 between '01OCT2019'd and '30SEP2020'd then 1
else 0 end) as NDP_vio_cnt_t1,
sum( case when ticket_date_v1 between '01OCT2018'd and '30SEP2019'd then 1
else 0 end) as NDP_vio_cnt_t2,
sum( case when ticket_date_v1 between '01OCT2017'd and '30SEP2018'd then 1
else 0 end) as NDP_vio_cnt_t3,
sum( case when ticket_date_v1 LT '01OCT2017'd then 1
else 0 end) as NDP_vio_cnt_t4
from public.RTA_NDP_Red_1
group by TRFFIC_NO;
quit;
run;`

Using by grouping in the data step will generate two temporary variables:FIRST.varibles and LAST.varibles.
And if a conditional statement is true,the value will be Assigned as 1.If a conditional statement is false,the value will be Assigned as 0.
If you grasp all above,then what you desired is a piece of cake.
proc sort data=public.RTA_NDP_Red_1;by TRFFIC_NO;run;
data RTA_NDP_Red_2;
set public.RTA_NDP_Red_1;
by TRFFIC_NO;
if first.TRFFIC_NO then call missing(of NDP_vio_cnt_t1-NDP_vio_cnt_t4);
NDP_vio_cnt_t1+('01OCT2019'd<=ticket_date_v1<='30SEP2020'd);
NDP_vio_cnt_t2+('01OCT2018'd<=ticket_date_v1<='30SEP2019'd);
NDP_vio_cnt_t3+('01OCT2017'd<=ticket_date_v1<='30SEP2018'd);
NDP_vio_cnt_t4+(ticket_date_v1<='01OCT2017'd);
if last.TRFFIC_NO then output;
run;
Hope it helps

Related

Sort all rows by length of string in variable X (longer strings first)

I have a variable UserName that contains IDs of variable length. A shortened example:
How can I sort all rows by variable X where longer strings are listed first.
Context: This is for calculating HEI 2015 scores using the ASA24 macro. It writes:
/*Note: Some users have found that the SAS program will drop observations from the analysis if the ID field is not the same length for all observations. To prevent this error, the observations with the longest ID length should be listed first when the data is imported into SAS. */
Proc SQL with an ORDER BY clause specifying an ordering value computed in a CASE expression.
The computation when length(X) > 8 then -length(X) else 0 ensures longest values are first when sorted and all value lengths <= some-capping-length (8) are treated equally
ORDER BY length(X) desc, X would also select longest X values first and then by X itself, but length would predominate ordering even when value lengths < 8.
data have;
length X $50;
input X; datalines;
GFHsp036
GFHsp038
GFHsp039
GFHsp040
GFHsp0400
GFHsp0401
GFHsp0402
GFHsp04021
;
proc sql;
create table want as
select * from have
order by
case when length(x) > 8 then -length(X) else 0 end,
X
;
quit;
proc print;
var X / style=[fontfamily='Courier'];
run;
Here is probably the simplest way to do this
data have;
input string $;
datalines;
abcde
ab
a
abcd
abc
;
proc sql;
create table want as
select * from have
order by length(string) desc;
quit;
Re-ordering IDs did not help in my case as PROC IMPORT needed GUESSINGROWS = MAX.
Please see SAS Macro Truncating IDs
For how to fix the truncating IDs that this question attempted to fix.

why proc sql sum function returns count instead of total values?

I am learning proc sql in SAS. When I use sql sum function, I realize if a comparison operator is added, the output is the count of rows instead of vertical sum. How can I get a vertical sum and what is the mechanism behind the said summation?
data apple;
input target;
cards;
0
1
3
5
;
run;
proc sql;
select sum(target ge 3)
from apple;
quit;
expected result will be 3+5=8;
actual result is 2
proc sql;
select sum(target)
from apple
where target ge 3;
quit;
I believe what your code was doing is evaluating (target gt 3) as a boolean expression, so since in SAS TRUE=1 and FALSE=0, the sum function was adding 0,0,1,1.
The solution from Craig is actually better, but with case when else end you could do what you tried.
proc sql;
select sum(case when target ge 3 then target else 0 end)
from apple;
quit;

How to recode values of a variable based on the maxmium value in the variable, for hundreds of variables?

I want to recode the max value of a variable as 1 and 0 when it is not. For each variable, there may be multiple observations with the max value. The max value for each value is not fixed, i.e. from cycle to cycle the max value for each variable may change. And there are hundreds of variables, cannot "hard-code" anything.
The final product would have the same dimensions as the original table, i.e. equal number of rows and columns as a matrix of 0s and 1s.
This is within SAS. I attempted to calculate the max of each variable and then append these max as a new observation into the data. Then comparing down the column of each variable against the "max" observation... looking into examples of the following did not help:
SQL
Array in datastep
proc transpose
formatting
Any insight would be much appreciated.
Here is a version done with SQL:
The idea is that we first calculate the maximum. The Latter select. Then we join the data to original and the outer the case-select specifies if the flag is set up or not.
data begin;
input var value;
cards;
1 1
1 2
1 3
1 2.5
1 1.7
1 3
2 34
2 33
2 33
2 33.7
2 34
2 34
; run;
proc sql;
create table result as
select a.var, a.value, case when a.value = b.maximum then 1 else 0 end as is_max from
(select * from begin) a
left join
(select max(value) as maximum, var from begin group by var) b
on a.var = b.var
;
quit;
To avoid "hard-code" you need to use some code generation.
First let's figure out what code you could use to solve the problem. Later we can look into ways to generate that code.
It is probably easiest to do this with PROC SQL code. SAS will allow you to reference the MAX() value of a variable. Also note that SAS evaluates boolean expressions to 1 (TRUE) or 0 (FALSE). So you just want to generate code like:
proc sql;
create table want as
select var1=max(var1) as var1
, var2=max(var2) as var2
from have
;
quit;
To generate the code you need a list of the variables in your source dataset. You can get those with PROC CONTENTS but also with the metadata table (view) DICTIONARY.COLUMNS (also accessible as SASHELP.VCOLUMN from outside PROC SQL).
If the list of variables is small then you could generate the code into a single macro variable.
proc sql noprint;
select catx(' ',cats(name,'=max(',name,')'),'as',name)
into :varlist separated by ','
from dictionary.columns
where libname='WORK' and memname='HAVE'
order by varnum
;
create table want as
select &varlist
from have
;
quit;
The maximum number of characters that will fit into a macro variable is 64K. So long enough for about 2,000 variables with names of 8 characters each.
Here is little more complex way that uses PROC SUMMARY and a data step with a temporary array. It does not really need any code generation.
%let dsin=sashelp.class(obs=10);
%let dsout=want;
%let varlist=_numeric_;
proc summary data=&dsin nway ;
var &varlist;
output out=summary(drop=_type_ _freq_) max= ;
run;
data &dsout;
if 0 then set &dsin;
array vars &varlist;
array max [10000] _temporary_;
if _n_=1 then do;
set summary ;
do _n_=1 to dim(vars);
max[_n_]=vars[_n_];
end;
end;
set &dsin;
do _n_=1 to dim(vars);
vars[_n_]=vars[_n_]=max[_n_];
end;
run;
Results:
Obs Name Sex Age Height Weight
1 Alfred M 0 1 1
2 Alice F 0 0 0
3 Barbara F 0 0 0
4 Carol F 0 0 0
5 Henry M 0 0 0
6 James M 0 0 0
7 Jane F 0 0 0
8 Janet F 1 0 1
9 Jeffrey M 0 0 0
10 John M 0 0 0

check all entries in a column sas and return a dummy variable sas

I am new in SAS and want to check if all entries in a variable in a data set satisfy a condition (namely =1) and return just one dummy variable 0 pr one depending whether all entries in the variable are 1 or at least one is not 1.
Any idea how to do it?
IF colvar = 1 THEN dummy_variable = 1
creates another variable dummy_variable of the same size as the original variable.
Thank you
* Generate test data;
data have;
colvar=0;
colvar2=0;
do i=1 to 20;
colvar=round(ranuni(0));
output;
end;
drop i;
run;
* Read the input dataset twice, first counting the number
* of observations and setting the dummy variables to 1 if
* the corresponding variable has the value 1 in any obser-
* vation, second outputting the result. The dummy variables
* remain unchanged during the second loop.;
data want;
_n_=0;
d_colvar=0;
d_colvar2=0;
do until (eof);
set have end=eof;
if colvar = 1
then d_colvar=1;
if colvar2 = 1
then d_colvar2=1;
* etc.... *;
_n_=_n_+1;
end;
do _n_=1 to _n_;
set have;
output;
end;
run;
PROC SQL is a good tool for quickly generating a summary of an arbitrarily defined condition. What exactly your condition is is not clear. I think you want the ALL_ONE value in the table the code below generates. That will be 1 when every observation has COLVAR=1. Any value that is NOT a one will cause the condition to be false (0) and so ALL_ONE will then have a value of 0 instead of 1.
You could store the result into a small table.
proc sql ;
create table check_if_one as
select min( colvar=1 ) as all_one
, max( colvar=1 ) as any_one
, max( colvar ne 1 ) as any_not_one
, min( colvar ne 1 ) as all_not_one
from my_table
;
quit;
But you could also just store the value into a macro variable that you could easily use later for some purpose.
proc sql noprint ;
select min( colvar=1 ) into :all_one trimmed from my_table ;
quit;

Count number of 0 values

Similar to here, I can count the number of missing observations:
data dataset;
input a b c;
cards;
1 2 3
0 1 0
0 0 0
7 6 .
. 3 0
0 0 .
;
run;
proc means data=dataset NMISS N;
run;
But how can I also count the number of observations that are 0?
If you want to count the number of observations that are 0, you'd want to use proc tabulate or proc freq, and do a frequency count.
If you have a lot of values and you just want "0/not 0", that's easy to do with a format.
data have;
input a b c;
cards;
1 2 3
0 1 0
0 0 0
7 6 .
. 3 0
0 0 .
;
run;
proc format;
value zerof
0='Zero'
.='Missing'
other='Not Zero';
quit;
proc freq data=have;
format _numeric_ zerof.;
tables _numeric_/missing;
run;
Something along those lines. Obviously be careful about _numeric_ as that's all numeric variables and could get messy quickly if you have a lot of them...
I add this as an additional answer. It requires you to have PROC IML.
This uses matrix manipulation to do the count.
(ds=0) -- creates a matrix of 0/1 values (false/true) of values = 0
[+,] -- sums the rows for all columns. If we have 0/1 values, then this is the number of value=0 for each column.
' -- operator is transpose.
|| -- merge matrices {0} || {1} = {0 1}
Then we just print the values.
proc iml;
use dataset;
read all var _num_ into ds[colname=names];
close dataset;
ds2 = ((ds=0)[+,])`;
n = nrow(ds);
ds2 = ds2 || repeat(n,ncol(ds),1);
cnames = {"N = 0", "Count"};
mattrib ds2 rowname=names colname=cnames;
print ds2;
quit;
Easiest to use PROC SQL. You will have to use a UNION to replicate the MEANS output;
Each section of the first FROM counts the 0 values for each variable and UNION stacks them up.
The last section just counts the number of observations in DATASET.
proc sql;
select n0.Variable,
n0.N_0 label="Number 0",
n.count as N
from (
select "A" as Variable,
count(a) as N_0
from dataset
where a=0
UNION
select "B" as Variable,
count(b) as N_0
from dataset
where b=0
UNION
select "C" as Variable,
count(c) as N_0
from dataset
where c=0
) as n0,
(
select count(*) as count
from dataset
) as n;
quit;
there is levels options in proc freq you could use.
proc freq data=dataset levels;
table _numeric_;
run;