Read previous and next observations - sas

I have a dataset like this(sp is an indicator):
datetime sp
ddmmyy:10:30:00 N
ddmmyy:10:31:00 N
ddmmyy:10:32:00 Y
ddmmyy:10:33:00 N
ddmmyy:10:34:00 N
And I would like to extract observations with "Y" and also the previous and next one:
ID sp
ddmmyy:10:31:00 N
ddmmyy:10:32:00 Y
ddmmyy:10:33:00 N
I tired to use "lag" and successfully extract the observations with "Y" and the next one, but still have no idea about how to extract the previous one.
Here is my try:
data surprise_6_step3; set surprise_6_step2;
length lag_sp $1;
lag_sp=lag(sp);
if sp='N' and lag(sp)='N' then delete;
run;
and the result is:
ID sp
ddmmyy:10:32:00 Y
ddmmyy:10:33:00 N
Any methods to extract the previous observation also?
Thx for any help.

Try using the point option in set statement in data step.
Like this:
data extract;
set surprise_6_step2 nobs=nobs;
if sp = 'Y' then do;
current = _N_;
prev = current - 1;
next = current + 1;
if prev > 0 then do;
set x point = prev;
output;
end;
set x point = current;
output;
if next <= nobs then do;
set x point = next;
output;
end;
end;
run;
There is an implicite loop through dataset when you use it in set statement.
_N_ is an automatic variable that contains information about what observation is implicite loop on (starts from 1). When you find your value, you store the value of _N_ into variable current so you know on which row you have found it. nobs is total number of observations in a dataset.
Checking if prev is greater then 0 and if next is less then nobs avoids an error if your row is first in a dataset (then there is no previous row) and if your row is last in a dataset (then there is no next row).

/* generate test data */
data test;
do dt = 1 to 100;
sp = ifc( rand("uniform") > 0.75, "Y", "N" );
output;
end;
run;
proc sql;
create table test2 as
select *,
monotonic() as _n
from test
;
create table test3 ( drop= _n ) as
select a.*
from test2 as a
full join test2 as b
on a._n = b._n + 1
full join test2 as c
on a._n = c._n - 1
where a.sp = "Y"
or b.sp = "Y"
or c.sp = "Y"
;
quit;

Related

Use a macro instead of 25 proc sql steps?

I have a SAS code (SQL) that has to repeat for 25 times; for each month/year combination (see code below). How can I use a macro in this code?
proc sql;
create table hh_oud_AUG_17 as
select hh_key
,sum(RG_count) as RG_count_aug_17
,case when sum(RG_count) >=2 then 1 else 0 end as loyabo_recht_aug_17
from basis_RG_oud
where valid_from_dt <= "01AUG2017"d <= valid_to_dt
group by hh_key
order by hh_key
;
quit;
proc sql;
create table hh_oud_SEP_17 as
select hh_key
,sum(RG_count) as RG_count_sep_17
,case when sum(RG_count) >=2 then 1 else 0 end as loyabo_recht_sep_17
from basis_RG_oud
where valid_from_dt <= "01SEP2017"d <= valid_to_dt
group by hh_key
order by hh_key
;
quit;
If you use a data step to do this, you can put all the desired columns in the same output dataset rather than using a macro to create 25 separate datasets:
/*Generate lists of variable names*/
data _null_;
stem1 = "RG_count_";
stem2 = "loyabo_recht_";
month = '01aug2017'd;
length suffix $4 vlist1 vlist2 $1000;
do i = 0 to 24;
suffix = put(intnx('month', month, i, 's'), yymmn4.);
vlist1 = catx(' ', vlist1, cats(stem1,suffix));
vlist2 = catx(' ', vlist2, cats(stem2,suffix));
end;
call symput("vlist1",vlist1);
call symput("vlist2",vlist2);
run;
%put vlist1 = &vlist1;
%put vlist2 = &vlist2;
/*Produce output table*/
data want;
if 0 then set have;
start_month = '01aug2017'd;
array rg_count[2, 0:24] &vlist1 &vlist2;
do _n_ = 1 by 1 until(last.hh_key);
set basis_RG_oud;
by hh_key;
do i = 0 to hbound2(rg_count);
if valid_from_dt <= intnx('month', start_month, i, 's') <= valid_to_dt
then rg_count[1,i] = sum(rg_count[1,i],1);
end;
end;
do _n_ = 1 to _n_;
set basis_RG_oud;
do i = 0 to hbound2(rg_count);
rg_count[2,i] = rg_count[1,i] >= 2;
end;
end;
run;
Create a second data set that enumerates (is a list of) the months to be examined. Cross Join the original data to that second data set. Create a single output table (or view) that contains the month as a categorical variable and aggregates based on that. You will be able to by-group process, classify or subset based on the month variable.
data months;
do month = '01jan2017'd to '31dec2018'd;
output;
month = intnx ('month', month, 0, 'E');
end;
format month monyy7.;
run;
proc sql;
create table want as
select
month, hh_key,
sum(RG_count) as RG_count,
case when sum(RG_count) >=2 then 1 else 0 end as loyabo_recht
from
basis_RG_oud
cross join
months
where
valid_from_dt <= month <= valid_to_dt
group
by month, hh_key
order
by month, hh_key
;
…
/* Some analysis */
BY MONTH;
…
/* Some tabulation */
CLASS MONTH;
TABLE … MONTH …
WHERE year(month) = 2018;

Automate check for number of distinct values SAS

Looking to automate some checks and print some warnings to a log file. I think I've gotten the general idea but I'm having problems generalising the checks.
For example, I have two datasets my_data1 and my_data2. I wish to print a warning if nobs_my_data2 < nobs_my_data1. Additionally, I wish to print a warning if the number of distinct values of the variable n in my_data2 is less than 11.
Some dummy data and an attempt of the first check:
%LET N = 1000;
DATA my_data1(keep = i u x n);
a = -1;
b = 1;
max = 10;
do i = 1 to &N - 100;
u = rand("Uniform"); /* decimal values in (0,1) */
x = a + (b-a) * u; /* decimal values in (a,b) */
n = floor((1 + max) * u); /* integer values in 0..max */
OUTPUT;
END;
RUN;
DATA my_data2(keep = i u x n);
a = -1;
b = 1;
max = 10;
do i = 1 to &N;
u = rand("Uniform"); /* decimal values in (0,1) */
x = a + (b-a) * u; /* decimal values in (a,b) */
n = floor((1 + max) * u); /* integer values in 0..max */
OUTPUT;
END;
RUN;
DATA _NULL_;
FILE "\\filepath\log.txt" MOD;
SET my_data1 NOBS = NOBS1 my_data2 NOBS = NOBS2 END = END;
IF END = 1 THEN DO;
PUT "HERE'S A HEADER LINE";
END;
IF NOBS1 > NOBS2 AND END = 1 THEN DO;
PUT "WARNING!";
END;
IF END = 1 THEN DO;
PUT "HERE'S A FOOTER LINE";
END;
RUN;
How can I set up the check for the number of distinct values of n in my_data2?
A proc sql way to do it -
%macro nobsprint(tab1,tab2);
options nonotes; *suppresses all notes;
proc sql;
select count(*) into:nobs&tab1. from &tab1.;
select count(*) into:nobs&tab2. from &tab2.;
select count(distinct n) into:distn&tab2. from &tab2.;
quit;
%if &&nobs&tab2. < &&nobs&tab1. %then %put |WARNING! &tab2. has less recs than &tab1.|;
%if &&distn&tab2. < 11 %then %put |WARNING! distinct VAR n count in &tab2. less than 11|;
options notes; *overrides the previous option;
%mend nobsprint;
%nobsprint(my_data1,my_data2);
This would break if you have to specify libnames with the datasets due to the .. And, you can use proc printto log to print it to a file.
For your other part as to just print the %put use the above as a call -
filename mylog temp;
proc printto log=mylog; run;
options nomprint nomlogic;
%nobsprint(my_data1,my_data2);
proc printto; run;
This won't print any erroneous text to SAS log other than your custom warnings.
#samkart provided perhaps the most direct, easily understood way to compare the obs counts. Another consideration is performance. You can get them without reading the entire data set if your data set has millions of obs.
One method is to use nobs= option in the set statement like you did in your code, but you unnecessarily read the data sets. The following will get the counts and compare them without reading all of the observations.
62 data _null_;
63 if nobs1 ne nobs2 then putlog 'WARNING: Obs counts do not match.';
64 stop;
65 set sashelp.cars nobs=nobs1;
66 set sashelp.class nobs=nobs2;
67 run;
WARNING: Obs counts do not match.
Another option is to get the counts from sashelp.vtable or dictionary.tables. Note that you can only query dictionary.tables with proc sql.

check all entries in a column sas and return a dummy variable sas

I am new in SAS and want to check if all entries in a variable in a data set satisfy a condition (namely =1) and return just one dummy variable 0 pr one depending whether all entries in the variable are 1 or at least one is not 1.
Any idea how to do it?
IF colvar = 1 THEN dummy_variable = 1
creates another variable dummy_variable of the same size as the original variable.
Thank you
* Generate test data;
data have;
colvar=0;
colvar2=0;
do i=1 to 20;
colvar=round(ranuni(0));
output;
end;
drop i;
run;
* Read the input dataset twice, first counting the number
* of observations and setting the dummy variables to 1 if
* the corresponding variable has the value 1 in any obser-
* vation, second outputting the result. The dummy variables
* remain unchanged during the second loop.;
data want;
_n_=0;
d_colvar=0;
d_colvar2=0;
do until (eof);
set have end=eof;
if colvar = 1
then d_colvar=1;
if colvar2 = 1
then d_colvar2=1;
* etc.... *;
_n_=_n_+1;
end;
do _n_=1 to _n_;
set have;
output;
end;
run;
PROC SQL is a good tool for quickly generating a summary of an arbitrarily defined condition. What exactly your condition is is not clear. I think you want the ALL_ONE value in the table the code below generates. That will be 1 when every observation has COLVAR=1. Any value that is NOT a one will cause the condition to be false (0) and so ALL_ONE will then have a value of 0 instead of 1.
You could store the result into a small table.
proc sql ;
create table check_if_one as
select min( colvar=1 ) as all_one
, max( colvar=1 ) as any_one
, max( colvar ne 1 ) as any_not_one
, min( colvar ne 1 ) as all_not_one
from my_table
;
quit;
But you could also just store the value into a macro variable that you could easily use later for some purpose.
proc sql noprint ;
select min( colvar=1 ) into :all_one trimmed from my_table ;
quit;

Split a row into multiple rows in SAS enterprise guide

I need help to split a row into multiple rows when the value on the row is something like 1-5. The reason is that I need to count 1-5 to become 5, and not 1, as it is when it count on one row.
I've a ID, the value and where it belong.
As exempel:
ID Value Page
1 1-5 2
The output I want is something like this:
ID Value Page
1 1 2
1 2 2
1 3 2
1 4 2
1 5 2
I've tried using a IF-statement
IF bioVerdi='1-5' THEN
DO;
..
END;
So I don't know what I should put between the DO; and END;. Any clues to help me out here?
You need to loop over the values inside your range and OUTPUT the values. The OUTPUT statement causes the Data Step to write a record to the output data set.
data want;
set have;
if bioVerdi = '1-5' then do;
do value=1 to 5;
output;
end;
end;
Here is another solution that is less restricted to the actual value '1-5' given in your example, but would work for any value in the format '1-6', '1-7', '1-100', etc.
*this is the data you gave ;
data have ;
ID = 1 ;
value = '1-5';
page = 2;
run;
data want ;
set have ;
min = scan( value, 1, '-' ) ; * get the 1st word, delimited by a dash ;
max = scan( value, 2, '-' ) ; * get the 2nd word, delimited by a dash ;
/*loop through the values from min to max, and assign each value as the loop iterates to a new column 'NEWVALUE.' Each time the loop iterates through the next value, output a new line */
do newvalue = min to max ;
output ;
end;
/*drop the old variable 'value' so we can rename the newvalue to it in the next step*/
drop value min max;
/*newvalue was a temporary name, so renaming here to keep the original naming structure*/
rename newvalue = value ;
run;

Find the next non blank value in SAS

I am trying to linearly interpolate values within a panel set data. So I am find the next non zero value within a variable if the current value of the variable is "."
For example if X = { 1, 2, . , . , . ,7), I want to store 7 as a variable "Y" and subject the lag value of X from it as the numerator of the slope. Can anyone help with this step?
If you cannot transpose your data, here is a way that will work for your given example:
data test;
input id $3. x best12.;
datalines;
AAA 1
BBB 2
CCC .
DDD .
EEE .
FFF 7
;
run;
data test2;
set test;
n = _n_;
if x ne .;
run;
data test3;
set test2;
lagx = lag(x);
lagn = lag(n);
if _n_ > 1 and n ne lagn + 1 then do;
postiondiff = n - lagn;
valuediff = x - lagx;
do i = (lagx + ((x-lagx)/(n-lagn))) to x by ((x-lagx)/(n-lagn));
x = i;
output;
end;
end;
else output;
keep x;
run;
data test4;
merge test test3 (rename = (x=newx));
run;
So we are basically rebuilding the variable with the interpolated values, then remerging it into the original dataset without a by variable which will line up all the new interpolated data with the missing points.
Is there a way you could transpose all your data? Interpolating like that is much easier when all the data you need is in a single observation. Like this:
data test;
input x best12.;
datalines;
1
2
.
.
.
7
;
run;
proc transpose data = test
out = test2;
run;
data test3;
set test2;
array xvalues {*} COL1-COL6;
array interpol {4,10} begin1-begin10 end1-end10 begposition1-begposition10 endposition1-endposition10;
rangenum = 1;
* Find the endpoints of the missing ranges;
do i = 1 to dim(xvalues);
if xvalues{i} ne . then lastknownx = xvalues{i};
else do;
interpol{1,rangenum} = lastknownx;
if interpol{3,rangenum} = . then interpol{3,rangenum} = i - 1;
end;
if i > 1 and xvalues{i} ne . then do;
if xvalues{i-1} = . then do;
interpol{2,rangenum} = xvalues{i};
interpol{4,rangenum} = i;
rangenum = rangenum + 1;
end;
end;
end;
* Interpolate;
rangenum = 1;
do j = 1 to dim(xvalues);
if xvalues{j} = . then do;
xvalues{j} = interpol{1,rangenum} + (j-interpol{3,rangenum})*((interpol{2,rangenum}-interpol{1,rangenum})/(interpol{4,rangenum}-interpol{3,rangenum}));
end;
else if j > 1 and xvalues{j} ne . then do;
if xvalues{j-1} = . then rangenum = rangenum + 1;
end;
end;
keep col1-col6;
run;
That can handle up to 10 different missing ranges per observation, though you could tweak the code to handle much more than that by creating bigger arrays.
The SAS data step reads a dataset one record at a time from top to bottom. So at record i, it can't access i+1 because it hasn't read it yet; it can only access i-1. Assume you have a dataset with a variable x.
data intrpl;
retain _x;
set yourdata;
by x notsorted;
if not missing(x) then do;
_x = x;
if last.x then do;
slope = _x - lag(_x);
output;
end;
end;
run;
Transposing can get kind of messy if x takes on a lot of values, so I recommend this method. I hope it helps!