sas run if statement over macro variables

sas run if statement over macro variables - if-statement

I have the following two sas datasets:
data have ;
input a b;
cards;
1 15
2 10
3 40
4 200
1 25
2 15
3 10
4 75
1 1
2 99
3 30
4 100
;
data ref ;
input x y;
cards;
1 10
2 20
3 30
4 100
;
I would like to have the following dataset:
data want ;
input a b outcome ;
cards;
1 15 0
2 10 1
3 40 0
4 200 0
1 25 0
2 15 1
3 10 1
4 75 1
1 1 1
2 99 0
3 30 1
4 100 1
;
I would like to create a variable 'outcome' which is produced by an if statement upon conditions of variables a, b, x and y. As in reality the 'have' dataset is extremely large I would like to avoid a sort and merging the two datasets together (where a = x).
I am trying to use macro variables with the following code:
data _null_ ;
set ref ;
call symput('listx', x) ;
call symput('listy', y) ;
run ;
data want ;
set have ;
if a=&listx and b le &listy then outcome = 1 ; else outcome = 0 ;
run ;
which does not however produce the desired result:
data want ;
input a b outcome ;
cards;
1 15 0
2 10 1
3 40 0
4 200 0
1 25 0
2 15 1
3 10 1
4 75 1
1 1 1
2 99 0
3 30 1
4 100 1
;

redone my solution using hash tables. Below my approach
data ref2(rename=(x=a));
set ref ;
run;
data want;
declare Hash Plan ();
rc = plan.DefineKey ('a'); /*x originally*/
rc = plan.DefineData('a', 'y');
rc = plan.DefineDone();
do until (eof1);
set ref2 end=eof1;
rc = plan.add(); /*add each record from ref2 to plan (hash table)*/
end;
do until (eof2);
set have end=eof2;
call missing(y);
rc = plan.find();
outcome = (rc =0 and b<y);
output;
end;
stop;
run;
hope it helps

Related

SAS : getting list of numbers based on reducing months

I have this data
data have;
input cust_id pmt months;
datalines;
AA 100 0
AA 50 1
AA 200 2
AA 350 3
AA 150 4
AA 700 5
BB 500 0
BB 300 1
BB 1000 2
BB 800 3
run;
and I'd like to generate an output that looks like this
data want;
input cust_id pmt months i;
datalines;
AA 100 0 0
AA 50 0 1
AA 200 0 2
AA 350 0 3
AA 150 0 4
AA 700 0 5
AA 50 1 0
AA 200 1 1
AA 350 1 2
AA 150 1 3
AA 700 1 4
AA 200 2 0
AA 350 2 1
AA 150 2 2
AA 700 2 3
AA 350 3 0
AA 150 3 1
AA 700 3 2
AA 150 4 0
AA 700 4 1
AA 700 5 0
BB 500 0 0
BB 300 0 1
BB 1000 0 2
BB 800 0 3
BB 300 1 0
BB 1000 1 1
BB 800 1 2
BB 1000 2 0
BB 800 2 1
BB 800 3 0
run;
There are few thousand rows with different cust_ID and different months length. I tried joining tables but it couldn't get me the sequence of 100 50 200 350 150 700 (for cust_ID AA). I could only replicated 100 if my months are 0, 50 if months are 1 & so on. I created a maxval which is the maximum month value. My code is something like this
data temp1;
set have;
do i = 0 to maxval;
if (months <=maxval) then output;
end;
i thought of creating a uniquekey to join my have data and temp1 data but it could only give me
AA 100 0 0
AA 50 0 1
AA 200 0 2
AA 350 0 3
AA 150 0 4
AA 700 0 5
AA 100 1 0
AA 50 1 1
AA 200 1 2
AA 350 1 3
AA 150 1 4
AA 100 2 0
AA 50 2 1
AA 200 2 2
AA 350 2 3
AA 100 3 0
AA 50 3 1
AA 200 3 2
AA 100 4 0
AA 50 4 1
AA 100 5 0
Any thoughts or different approach on how to generate my want table? Thank you!

This problem is a little tricky because you have things going in three directions
The number of group repetitions descends from group count. Within each repetition:
The payments item start index ascends and terminates at group count
The months (as I) item start index is 1 and termination descends from group count
SQL
One SQL approach is a three-way reflexive join with-in group. The months values act as a within group index and must be monotonic by 1 from 0 for this to work.
proc sql;
create table want as
select X.cust_id, Z.pmt, X.months, Y.months as i
from have as X
join have as Y on X.cust_id = Y.cust_id
join have as Z on Y.cust_id = Z.cust_id
where
X.months + Y.months = Z.months
order by
X.cust_id, X.months, Z.months
;
quit;
DATA Step
A DOW loop is used to count the group size. 2-deep looping crosses the combinations and three point= values are computed (finagled) to retrieve the relevant values.
data want2;
if 0 then set have; * prep pdv to match have;
retain point_end ;
point_start = sum(point_end,0);
do group_count = 1 by 1 until (last.cust_id);
set have(keep=cust_id);
by cust_id;
end;
do index1 = 1 to group_count;
point1 = point_start + index1;
set have (keep=months) point = point1;
do index2 = 0 to group_count - index1 ;
point2 = point_start + index1 + index2;
set have (keep=pmt) point=point2;
point3 = point_start + index2 + 1;
set have (keep=months rename=months=i) point=point3;
output;
end;
end;
point_end = point1;
keep cust_id pmt months i;
run;

Try the following:
data want(drop = start_obs limit j);
retain start_obs 1;
/* read by cust_id group */
do until(last.cust_id);
set have end = last_obs;
by cust_id;
end;
limit = months;
do j = 0 to limit;
i = 0;
do obs_num = start_obs + j to start_obs + limit;
/* read specific observations using direct access */
set have point = obs_num;
months = j;
output;
i = i + 1;
end;
end;
/* prepare for next direct access read */
start_obs = limit + 2;
if last_obs then
stop;
run;

Identifying first occurrence after trigger event

I have a big panel dataset that looks somewhat like this:
data have;
input id t a b ;
datalines;
1 1 0 0
1 2 0 0
1 3 1 0
1 4 0 0
1 5 0 1
1 6 1 0
1 7 0 0
1 8 0 0
1 9 0 0
1 10 0 1
2 1 0 0
2 2 1 0
2 3 0 0
2 4 0 0
2 5 0 1
2 6 0 1
2 7 0 1
2 8 0 1
2 9 1 0
2 10 0 1
3 1 0 0
3 2 0 0
3 3 0 0
3 4 0 0
3 5 0 0
3 6 0 0
3 7 1 0
3 8 0 0
3 9 0 0
3 10 0 0
;
run;
For every ID I want to record all 'trigger' events, namely when a=1 and then I need to how long it takes to the next occurrence of b=1. The final output should then give me the following:
data want;
input id a_no a_t b_t diff ;
datalines;
1 1 3 5 2
1 2 6 10 4
2 1 2 5 3
2 2 9 10 1
3 1 7 . .
;
run;
It is of course no problem to get all a=1 and b=1 events, but as it is a very big dataset with a lot of both events for every ID I am searching for an elegant and straight-forward solution. Any ideas?

Here's a fairly simple SQL approach that gives more or less the desired output:
proc sql;
create table want
as select
t1.id,
t1.t as a_t,
t2.t as b_t,
t2.t - t1.t as diff
from
have(where = (a=1)) t1
left join
have(where = (b=1)) t2
on
t1.id = t2.id
and t2.t > t1.t
group by t1.id, t1.t
having diff = min(diff)
;
quit;
The only part missing is a_no. This sort of row-incrementing ID is quite a lot of work to generate consistently in SQL, but it's trivial with an extra data step:
data want;
set want;
by id;
if first.id then a_no = 0;
a_no + 1;
run;

An elegant DATA step way can use nested DOW loops. It's straight forward when you understand DOW loops.
data want(keep=id--diff);
length id a_no a_t b_t diff 8;
do until (last.id); * process each group;
do a_no = 1 by 1 until(last.id); * counter for each output;
do until ( output_condition or end); * process each triggering state change;
SET have end=end; * read data;
by id; * setup first. last. variables for group;
if a=1 then a_t = t; * detect and record start of trigger state;
output_condition = (b=1 and t > a_t > 0); * evaluate for proper end of trigger state;
end;
if output_condition then do;
b_t = t; * compute remaining info at output point;
diff = b_t - a_t;
OUTPUT;
a_t = .; * reset trigger state tracking variables;
b_t = .;
end;
else
OUTPUT; * end of data reached without triggered output;
end;
end;
run;
Note: A SQL way (not shown) can use self join within groups.

SAS, calculate row difference

data test;
input ID month d_month;
datalines;
1 59 0
1 70 11
1 80 21
2 10 0
2 11 1
2 13 3
3 5 0
3 9 4
4 8 0
;
run;
I have two columns of data ID and Month. Column 1 is the ID, the same ID may have multiple rows (1-5). The second column is the enrolled month. I want to create the third column. It calculates the different between the current month and the initial month for each ID.

you can do it like that.
data test;
input ID month d_month;
datalines;
1 59 0
1 70 11
1 80 21
2 10 0
2 11 1
2 13 3
3 5 0
3 9 4
4 8 0
;
run;
data calc;
set test;
by id;
retain current_month;
if first.id then do;
current_month=month;
calc_month=0;
end;
if ^first.id then do;
calc_month = month - current_month ;
end;
run;
Krs

subset of dataset using first and last in sas

Hi I am trying to subset a dataset which has following
ID sal count
1 10 1
1 10 2
1 10 3
1 10 4
2 20 1
2 20 2
2 20 3
3 30 1
3 30 2
3 30 3
3 30 4
I want to take out only those IDs who are recorded 4 times.
I wrote like
data AN; set BU
if last.count gt 4 and last.count lt 4 then delete;
run;
But there is something wrong.

EDIT - Thanks for clarifying. Based on your needs, PROC SQL will be more direct:
proc sql;
CREATE TABLE AN as
SELECT * FROM BU
GROUP BY ID
HAVING MAX(COUNT) = 4
;quit;
For posterity, here is how you could do it with only a data step:
In order to use first. and last., you need to use a by clause, which requires sorting:
proc sort data=BU;
by ID DESCENDING count;
run;
When using a SET statement BY ID, first.ID will be equal to 1 (TRUE) on the first instance of a given ID, 0 (FALSE) for all other records.
data AN;
set BU;
by ID;
retain keepMe;
If first.ID THEN DO;
IF count = 4 THEN keepMe=1;
ELSE keepMe=0;
END;
if keepMe=0 THEN DELETE;
run;
During the datastep BY ID, your data will look like:
ID sal count keepMe first.ID
1 10 4 1 1
1 10 3 1 0
1 10 2 1 0
1 10 1 1 0
2 20 3 0 1
2 20 2 0 0
2 20 1 0 0
3 30 4 1 1
3 30 3 1 0
3 30 2 1 0
3 30 1 1 0

If I understand correct, you are trying to extract all observations are are repeated 4 time or more. if so, your use of last.count and first.count is wrong. last.var is a boolean and it will indicate which observation is last in the group. Have a look at Tim's suggestion.
In order to extract all observations that are repeated four times or more, I would suggest to use the following PROC SQL:
PROC SQL;
CREATE TABLE WORK.WANT AS
SELECT /* COUNT_of_ID */
(COUNT(t1.ID)) AS COUNT_of_ID,
t1.ID,
t1.SAL,
t1.count
FROM WORK.HAVE t1
GROUP BY t1.ID
HAVING (CALCULATED COUNT_of_ID) ge 4
ORDER BY t1.ID,
t1.SAL,
t1.count;
QUIT;
Result:
1 10 1
1 10 2
1 10 3
1 10 4
3 30 1
3 30 2
3 30 3
3 30 4

Slight variation on Tims - assuming you don't necessarily have the count variable.
proc sql;
CREATE TABLE AN as
SELECT * FROM BU
GROUP BY ID
HAVING Count(ID) >= 4;
quit;

How to show variable value by proc tabulate in sas?

How can I manage proc tabulate to show the value of a variable with missing value instead of its statistic? Thanks!
For example, I want to show the value of sym. It takes value 'x' or missing value. How can I do it?
Sample code:
data test;
input tx mod bm $ yr sym $;
datalines;
1 1 a 0 x
1 2 a 0 x
1 3 a 0 x
2 1 a 0 x
2 2 a 0 x
2 3 a 0 x
3 1 a 0
3 2 a 0
3 3 a 0 x
1 1 b 0 x
1 2 b 0
1 3 b 0
1 4 b 0
1 5 b 0
2 1 b 0
2 2 b 0
2 3 b 0
2 4 b 0
2 5 b 0
3 1 b 0 x
3 2 b 0
3 3 b 0
1 1 c 0
1 2 c 0 x
1 3 c 0
2 1 c 0
2 2 c 0
2 3 c 0
3 1 c 0
3 2 c 0
3 3 c 0
1 3 a 1 x
2 3 a 1
3 3 a 1
1 3 b 1
2 3 b 1
3 3 b 1
1 3 c 1 x
2 3 c 1
3 3 c 1
;
run;
proc tabulate data=test;
class yr bm tx mod ;
var sym;
table yr*bm, tx*mod;
run;

proc tabulate data=test;
class tx mod bm yr sym;
table yr*bm, tx*mod*sym*n;
run;
That gives you ones for each SYM=x (since n=missing). That hides the rows for SYM=missing, hence you miss some values overall from your example table. (You could format the column with a format that defines 1 = 'x' easily).
proc tabulate data=test;
class tx mod bm yr;
class sym /missing;
table yr*bm, tx*mod*sym=' '*n;
run;
That gives you all of your combinations of the 4 main variables, but includes missing syms as their own column.
If you want to have your cake and eat it too, then you need to redefine SYM to be a numeric variable, so you can use it as a VAR.
proc format;
invalue ISYM
x=1
;
value FSYM
1='x';
quit;
data test;
infile datalines truncover;
input tx mod bm $ yr sym :ISYM.;
format sym FSYM.;
datalines;
1 1 a 0 x
1 2 a 0 x
1 3 a 0 x
... more lines ...
;
run;
proc tabulate data=test;
class tx mod bm yr;
var sym;
table yr*bm, tx*mod*sym*sum*f=FSYM.;
run;
All of these assume these are unique combination rows. If you start having multiples of yr*bm*tx*mod, you would have a problem here as this wouldn't give you the expected result (sum 1+1+1=3 would not give you an 'x').

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

sas run if statement over macro variables - if-statement

Related

SAS : getting list of numbers based on reducing months

Identifying first occurrence after trigger event

SAS, calculate row difference

subset of dataset using first and last in sas

How to show variable value by proc tabulate in sas?

Categories

Resources