I have a dataset like this(type is an indicator):
datetime type
...
ddmmyy:10:30:00 0
ddmmyy:10:31:00 0
ddmmyy:10:32:00 1
ddmmyy:10:33:00 0
ddmmyy:10:34:00 1
ddmmyy:10:35:00 0
...
I was trying to extract data with type 1 and also the previous and next one. Just try to extract (-1,+1) window based on type 1.
datetime type
...
ddmmyy:10:31:00 0
ddmmyy:10:32:00 1
ddmmyy:10:33:00 0
ddmmyy:10:34:00 1
ddmmyy:10:35:00 0
...
I found a similar post here. I copied and pasted the code, but I am not quite sure what does 'x' mean in his code. SAS gives me 'File WORK.x does not exist'.
Can someone help me out? Thx.
The X data set in the other post is the same source table you are filtering, so the logical order of the code is:
Check every row in the table 'Have', _N_ holds the current row number,
If Type = 1 then Set Have Point=_N_ goes to row _N_ in the 'Have' table and outputs that row to the new table 'want', then continues to the next row. The _N_ can be the pointer to the current, previous or next row. ( The two IF statements handles the cases of first row and last row; where there is no Previous or no Next)
Full Working Code:
data have;
length datetime $23.;
input datetime $ type ;
datalines;
ddmmyy:10:30:00 0
ddmmyy:10:31:00 0
ddmmyy:10:32:00 1
ddmmyy:10:33:00 0
ddmmyy:10:34:00 1
ddmmyy:10:35:00 0
;
run;
data want;
set have nobs=nobs;
if type = 1 then do;
current = _N_;
prev = current - 1;
next = current + 1;
if prev > 0 then do;
set have point = prev;
output;
end;
set have point = current;
output;
if next <= nobs then do;
set have point = next;
output;
end;
end;
run;
proc sort data=want noduprecs;
by _all_ ; Run;
Note: I added an extra step proc sort to remove duplicate rows.
Output:
datetime=ddmmyy:10:31:00 type=0
datetime=ddmmyy:10:32:00 type=1
datetime=ddmmyy:10:33:00 type=0
datetime=ddmmyy:10:34:00 type=1
datetime=ddmmyy:10:35:00 type=0
For you example data that does not have any id or group variable it should be pretty straight forward. Instead of thinking about moving back and forth in the file, just create new variables that contain the previous (LAG_TYPE) and next (LEAD_TYPE) value for TYPE. Then your requirement to keep the observations before the one with TYPE=1 is translated to keeping the observations where LEAD_TYPE=1.
Let's convert your sample data into a dataset.
data have ;
input datetime :$15. type ;
cards;
ddmmyy:10:30:00 0
ddmmyy:10:31:00 0
ddmmyy:10:32:00 1
ddmmyy:10:33:00 0
ddmmyy:10:34:00 1
ddmmyy:10:35:00 0
;
Rather than actually keeping the required observations I will make a new variable KEEP that will be true for records that meet your criteria.
data want ;
recno+1;
set have end=eof;
lag_type=lag(type);
if not eof then set have(firstobs=2 keep=type rename=(type=lead_type));
else lead_type=.;
keep= (type=1 or lag_type=1 or lead_type=1) ;
run;
Here is the result.
recno datetime type lag_type lead_type keep
1 ddmmyy:10:30:00 0 . 0 0
2 ddmmyy:10:31:00 0 0 1 1
3 ddmmyy:10:32:00 1 0 0 1
4 ddmmyy:10:33:00 0 1 1 1
5 ddmmyy:10:34:00 1 0 0 1
6 ddmmyy:10:35:00 0 1 . 1
Move next observation up and compare two observations at the same row or use lag to compare the current observation and previous observation.
data have;
length datetime $23.;
input datetime $ type ;
datalines;
ddmmyy:10:30:00 0
ddmmyy:10:31:00 0
ddmmyy:10:32:00 1
ddmmyy:10:33:00 0
ddmmyy:10:34:00 1
ddmmyy:10:35:00 0
;
run;
data want;
merge have have(firstobs=2 keep=type rename=(type=_type));
if max(type,_type) or max(type,lag(type)) ;
drop _type;
run;
Related
Suppose i have random diagnostic codes, such as 001, v58, ..., 142,.. How can I construct columns from the codes which is 1 for the records?
Input:
id found code
1 1 001
2 0 v58
3 1 v58
4 1 003
5 0 v58
......
......
15000 0 v58
Output:
id code_001 code_v58 code_003 .......
1 1 0 0
2 0 0 0
3 0 1 0
4 1 0 0
5 0 0 0
.........
.........
You will want to TRANSPOSE the values and name the pivoted columns according to data (value of code) with an ID statement.
Example:
In real world data it is often the case that missing diagnoses will be flagged zero, and that has to be done in a subsequent step.
data have;
input id found code $;
datalines;
1 1 001
2 0 v58
2 1 003 /* second diagnosis result for patient 2 */
3 1 v58
4 1 003
5 0 v58
;
proc transpose data=have out=want(drop=_name_) prefix=code_;
by id;
id code; * column name becomes <prefix><code>;
var found;
run;
* missing occurs when an id was not diagnosed with a code;
* if that means the flag should be zero (for logistic modeling perhaps)
* the missings need to be changed to zeroes;
data want;
set want;
array codes code_:;
do _n_ = 1 to dim(codes); /* repurpose automatic variable _n_ for loop index */
if missing(codes(_n_)) then codes(_n_) = 0;
end;
run;
I have a dataset that looks like:
Hour Flag
1 1
2 1
3 .
4 1
5 1
6 .
7 1
8 1
9 1
10 .
11 1
12 1
13 1
14 1
I want to have an output dataset like:
Total_Hours Count
2 2
3 1
4 1
As you can see, I want to count the number of hours included in each period with consecutive "1s". A missing value ends the consecutive sequence.
How should I go about doing this? Thanks!
You'll need to do this in two steps. First step is making sure the data is sorted properly and determining the number of hours in a consecutive period:
PROC SORT DATA = <your dataset>;
BY hour;
RUN;
DATA work.consecutive_hours;
SET <your dataset> END = lastrec;
RETAIN
total_hours 0
;
IF flag = 1 THEN total_hours = total_hours + 1;
ELSE
DO;
IF total_hours > 0 THEN output;
total_hours = 0;
END;
/* Need to output last record */
IF lastrec AND total_hours > 0 THEN output;
KEEP
total_hours
;
RUN;
Now a simple SQL statement:
PROC SQL;
CREATE TABLE work.hour_summary AS
SELECT
total_hours
,COUNT(*) AS count
FROM
work.consecutive_hours
GROUP BY
total_hours
;
QUIT;
You will have to do two things:
compute the run lengths
compute the frequency of the run lengths
For the case of using the implict loop
Each run length occurnece can be computed and maintained in a retained tracking variable, testing for a missing value or end of data for output and a non missing value for run length reset or increment.
Proc FREQ
An alternative is to use an explicit loop and a hash for frequency counts.
Example:
data have; input
Hour Flag; datalines;
1 1
2 1
3 .
4 1
5 1
6 .
7 1
8 1
9 1
10 .
11 1
12 1
13 1
14 1
;
data _null_;
declare hash counts(ordered:'a');
counts.defineKey('length');
counts.defineData('length', 'count');
counts.defineDone();
do until (end);
set have end=end;
if not missing(flag) then
length + 1;
if missing(flag) or end then do;
if length > 0 then do;
if counts.find() eq 0
then count+1;
else count=1;
counts.replace();
length = 0;
end;
end;
end;
counts.output(dataset:'want');
run;
An alternative
data _null_;
if _N_ = 1 then do;
dcl hash h(ordered : "a");
h.definekey("Total_Hours");
h.definedata("Total_Hours", "Count");
h.definedone();
end;
do Total_Hours = 1 by 1 until (last.Flag);
set have end=lr;
by Flag notsorted;
end;
Count = 1;
if Flag then do;
if h.find() = 0 then Count+1;
h.replace();
end;
if lr then h.output(dataset : "want");
run;
Several weeks ago, #Richard taught me how to use DOW-loop and direct addressing array. Today, I give it to you.
data want(keep=Total_Hours Count);
array bin[99]_temporary_;
do until(eof1);
set have end=eof1;
if Flag then count + 1;
if ^Flag or eof1 then do;
bin[count] + 1;
count = .;
end;
end;
do i = 1 to dim(bin);
Total_Hours = i;
Count = bin[i];
if Count then output;
end;
run;
And Thanks Richard again, he also suggested me this article.
My question is related with below topic:
SAS check field by field.
I'm searching method which set (put) to string/variable name of columns if field has value of 0.
Is there an elegant method for the?
Best regards!
The VNAME function will return the name of the variable corresponding to an array reference.
data Have;
input REFERENCE_DATE
L_CONTRACT
L_CONTRACT_ACTIVITY
L_LFC
L_CONTRACT_CO_CUSTOMER
L_CONTRACT_OBJECT
L_CUSTOMER
L_CUSTOMER_RETAIL
L_DPD
L_GL_ACCOUNT
L_GL_AMOUNT
L_EXTRA_COST
L_PRODUCT;
datalines;
450 1 9 8 6 0 4 3 0 0 0 0 0
;
data want;
length vars_with_zero $1000;
set have;
array L L_CONTRACT -- L_CUSTOMER_RETAIL;
* accumulate the names of the variables that have a zero value;
do _n_ = 1 to dim(L);
if L(_n_) = 0 then vars_with_zero = catx(' ', vars_with_zero, vname(L(_n_)));
end;
run;
I have a dataset in SAS and I want to Convert one column into string by the Product. I have attached the image of input and output required.
I need the Colomn STRING in the outut. can anyone please help me ?
I have coded a data step to create the input data:
data have;
input products $
dates
value
;
datalines;
a 1 0
a 2 0
a 3 1
a 4 0
a 5 1
a 6 1
b 1 0
b 2 1
b 3 1
b 4 1
b 5 0
b 6 0
c 1 1
c 2 0
c 3 1
c 4 1
c 5 0
c 6 1
;
Does the following suggested solution give you what you want?:
data want;
length string $ 20;
do until(last.products);
set have;
by products;
string = catx(',',string,value);
end;
do until(last.products);
set have;
by products;
output;
end;
run;
Here's my quick solution.
data temp;
length cat $20.;
do until (last.prod);
set have;
by prod notsorted;
cat=catx(',',cat,value);
end;
drop value date;
run;
proc sql;
create table want as
select have.*, cat as string
from have inner join temp
on have.prod=temp.prod;
quit;
I have a dataset with patient having multiple courses during a treatment phase.
Data set looks like:
C 1 1 0
C 0 0 1
C 1 1 0
C 0 0 1
The first two rows: patient start at row1 and finishes at row2. This is the first course of patient C.
The second two rows: patient C again starts at row3 and finishes at row four.
How can I create an identifier for these two courses using the first and last statements in SAS.
Expected output should look like this;
C 1 1 0 23
C 0 0 1 23
C 1 1 0 24
C 0 0 1 24
C 1 1 1 25
The counts for one course should be the same and different from courses to courses within he same patient.
Thanks.
Assuming the third variable, whatever it is, is your 'end state' the following works. Probably not the easiest method but hopefully clear. I don't know if First/Last will actually help in this situation except for when the ID switches.
Idea is look for the V3=1 and then set a flag to 1. If the flag is 1, then the next record increments and resets the flag and the process is continued. Retain is used to hold the values of Flag and Course across the rows.
data have;
input ID $ v1-v3;
cards;
C 1 1 0
C 0 0 1
C 1 1 0
C 0 0 1
D 1 0 0
D 0 1 0
D 0 0 1
;
run;
data want;
set have;
BY ID;
retain flag 0 course;
if first.ID then do;
Course=1;
flag=0;
end;
if flag=1 then do;
course=course+1;
flag=0;
end;
else if v3=1 and flag=0 then flag=1;
run;
proc print;
run;