replace missing value with non-zero values by column - sas

Data:
A B C D E
2 3 4 . .
2 3 0 0 .
0 3 4 1 1
0 . 4 0 1
2 0 0 0 1
Ideal output:
A B C D E
2 3 4 1 1
2 3 0 0 1
0 3 4 1 1
0 3 4 0 1
2 0 0 0 1
For each column, there are only 3 possible values: an arbitrary integer, zero, and missing value.
I want to replace the missing values with the non-zero value in the corresponding column.
If the arbitrary integer is zero, then missing value should be replaced by zero.
For actual problem, the number of row and number of columns are not small.

Make two arrays--one with your column names and another with variables to hold the arbitrary integers. Loop through the data set once to get the integers (looping over the columns in the array), then again to output the values, replacing where necessary (again, looping through the columns in the array).
data want(drop=i int1-int5);
do until (eof);
set have end=eof;
array _col a--e;
array _int int1-int5;
do i = 1 to dim(_col);
if _col(i) not in (.,0) then _int(i)=_col(i);
end;
end;
do until (_eof);
set have end=_eof;
do i = 1 to dim(_col);
if missing(_col(i)) then _col(i)=_int(i);
end;
output;
end;
run;

Related

sas search value across column with array and extract values of next 12 columns

I want to count the number of 'noncure' occurrences across different columns with some condition, at different position dates. How do I search for the occurrence of 12 '1's across columns.
[UPDATE]
I've modified my dataset and think this is the best way to populate out my desired results.
This is a sample of my raw data
data have;
input acct flg1 flg2 flg3 flg4 flg5 flg6 flg7 flg8 flg9 flg10 flg11 flg12 flg13 flg14 flg15 flg16 flg17 flg18 flg19 flg20 flg21 flg22 flg23 flg24 flg25;
datalines;
AA 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1
run;
The numbers on flg represent months - eg flg1 = jan10, flg2 = feb10 & so on.
To get noncure, certain conditions have to be fulfilled.
flg(i) has to be 0
noncure only happens if there is a minimum of 12 consecutive flg of '1' in the future
an account can have more than 1 noncure incidents
The computation of noncure should look like this (Refer to image for a better view - highlighted in green)
AA 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
noncure1 is 1 because flg1 is 0 and the next 12 1 is at flg9
noncure2 is 1 because flg2 is 0 and the next 12 1 is at flg9
noncure4 is 0 because flg4 is not 0
noncure23 is 0 because even though flg23 is 0, there is no following consecutive 12 at flg25 (only one count of '1')
I'm having problems searching for my first instance of consecutive 12 '1' at flg(i).
I was thinking of doing an array to populate out position of consecutive 12 (eg nc_pos) then do i to nc_pos - something along the lines of
nc_pos = <search for 12 consecutive occurrence of '1' from flg(i)> **I don't know the code for this**
if flg(i) = 0 then do i to nc_pos;
noncure_tag = 1;
obs_pos = i;
FYI I have few hundred thousand accounts with a total of 84 months and their starting positions are different (eg flg1 could be null and the first 0 or 1 may appear at flg3).
My final output should look something like the image file labelled TARGET highlighted in yellow.

How to recognise a particular sequence in a dataset and mark it?

How to recognize the first "1,0" sequence in column "Flag" from each group and mark a "1" just like it in column "Flag2"?
ID Flag Flag2
1 1
1 1 1
1 0
1 1
1 0
1 0
2 1
2 1
2 1
2 1 1
2 0
2 0
3 0
3 0
3 0
3 0
4 1
4 1 1
4 0
4 1
The problem requires using a 'lead' concept (value from next row) similar to the lag concept provided by the lag function. There is no built in lead function so you need to be creative.
Merge the data to itself, without a by statement, where the second version is:
Offset by one row by the firstobs data set option
Renames the variables so the lead state can be established with an if
A retained variable tracks if the 1,0 transition has been observed within the group.
Sample code:
data have;input
ID Flag; datalines;
1 1
1 1
1 0
1 1
1 0
1 0
2 1
2 1
2 1
2 1
2 0
2 0
3 0
3 0
3 0
3 0
4 1
4 1
4 0
4 1
run;
data want;
merge
have
have(firstobs=2 rename=(id=lead_id flag=lead_flag))
;
retain flagged_id;
if (id=lead_id) /* lead is in same group */
and (flag=1) and (lead_flag=0) /* transition identified */
and (flagged_id ne id) then /* first such transition for group */
do;
flag2=1; /* flag the lead transition */
flagged_id = id; /* track id where transition last flagged */
end;
drop lead_: flagged:;
run;

How to count the length of a nonzero sequence in SAS

I'd like to count the length of the non zero sequence in a data as below:
ID Value
1 0
1 0
1 2.5
1 3
1 0
1 4
1 2
1 5
1 0
So here the length of the first non zero sequence is 2 and the length of the second non zero sequence is 3. The new data will look like this:
ID Value Length
1 0 0
1 0 0
1 2.5 2
1 3 2
1 0 0
1 4 3
1 2 3
1 5 3
1 0 0
How can I write SAS code to accomplish this task with a large data like this. Thanks!
Here is one possible solution. It assumes there are no missing values in the Value variable and that your ID variable does not have any significance for this problem.
*creates new length variable that starts at 1 and increments by 1 from start to end of every non-zero sequence;
data step_one (drop=prev_val);
set orig_data;
retain prev_val length 0;
indx = _n_;
if value ne 0 and prev_val ne 0 then length = length + 1;
else if value ne 0 then length = 1;
else if value = 0 then length = 0;
prev_val = value;
run;
*sorts dataset in reverse order;
proc sort data=step_one;
by descending indx;
run;
*creates modified length variable that carries maximum length value for each sequence down to all observations included in that sequence;
data step_two (drop=length prev_length rename=(length_new=length));
set step_one;
retain length_new prev_length 0;
if length = 0 then length_new = 0;
else if length ne 0 and prev_length = 0 then
length_new = length;
prev_length = length;
run;
*re-sorts dataset back to its original order and outputs final dataset with just the needed variables;
proc sort data=step_two out=final_result (keep=ID value length);
by indx;
run;

SAS: Getting variable name of last non-zero observation

I'm trying to figure this out. I have a table as follows and I'm trying to populate the final column with the variable name of the last non-zero value (as shown in final column):
ID MTH_1 MTH_2 MTH_3 MTH_4 MTH_5 MONTH_LAST_BALANCE
--------------------------------------------------------------
1 10 0 10 20 10 MTH_5
2 5 10 15 5 0 MTH_4
3 5 10 5 0 0 MTH_3
4 1 2 3 1 0 MTH_4
5 1 0 0 0 0 MTH_1
I'm guessing I need to use some sort of array to make this work but I don't know. As per row 1, I need the last non-zero value only, not the left-most one that some other code seems to retrieve.
Any help would be much appreicated.
Cheers
data want ;
set have ;
/* Load MTH_1 to MTH_5 into array */
array m{*} MTH_1-MTH_5 ;
length MONTH_LAST_BALANCE $5. ;
/* Iterate over array */
do i = 1 to dim(m) ;
/* Use vname function to get variable name from array element */
if m{i} > 0 then MONTH_LAST_BALANCE = vname(m{i}) ;
end ;
run ;

Counting certain values in rows in SAS table

I'm trying to count certain values in rows and require little help!
I have table that looks like this:
data test;
input a b c d;
cards;
1 0 9 1
1 1 0 0
0 9 1 1
0 0 9 1
1 0 9 9
0 1 1 0
1 9 9 1
1 9 0 0
0 0 9 1
9 1 0 0;
run;
Variables a,b,c and d can have values 1, 0 or 9. Now I need to to make a new variable that has value of 1 when there is two or more values of 9 in a row. How do I do this?
Your question needs clarifying... do mean two 9's anywhere in a single row, or two 9's in a row (i.e. consecutively)?
A simple way is to concatenate (using cats()) all the values into a string, and use the index() function to check for the '99', or count() to count the 9's...
data want ;
set have ;
array all{*} a b c d ;
vallist = cats(of all{*}) ;
has99 = (index(vallist,'99') > 0) ; /* flag any two consecutive 9's */
two9s = (count(vallist,'9') >= 2) ; /* two or more 9's */
drop vallist ;
run ;
Here's one way you could do it. Sum your rows and store it in a new variable, e, then if that sum is 18 or larger then you know there has to be at least 2 9's.
data test;
set test;
e = a+b+c+d;
IF e >= 18 THEN f = 1;
ELSE f = 0;
DROP e;
run;
Try this:
data want;
set test;
flag=sum(of _all_)>=18;
run;