sas search value across column with array and extract values of next 12 columns - sas

I want to count the number of 'noncure' occurrences across different columns with some condition, at different position dates. How do I search for the occurrence of 12 '1's across columns.
[UPDATE]
I've modified my dataset and think this is the best way to populate out my desired results.
This is a sample of my raw data
data have;
input acct flg1 flg2 flg3 flg4 flg5 flg6 flg7 flg8 flg9 flg10 flg11 flg12 flg13 flg14 flg15 flg16 flg17 flg18 flg19 flg20 flg21 flg22 flg23 flg24 flg25;
datalines;
AA 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1
run;
The numbers on flg represent months - eg flg1 = jan10, flg2 = feb10 & so on.
To get noncure, certain conditions have to be fulfilled.
flg(i) has to be 0
noncure only happens if there is a minimum of 12 consecutive flg of '1' in the future
an account can have more than 1 noncure incidents
The computation of noncure should look like this (Refer to image for a better view - highlighted in green)
AA 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
noncure1 is 1 because flg1 is 0 and the next 12 1 is at flg9
noncure2 is 1 because flg2 is 0 and the next 12 1 is at flg9
noncure4 is 0 because flg4 is not 0
noncure23 is 0 because even though flg23 is 0, there is no following consecutive 12 at flg25 (only one count of '1')
I'm having problems searching for my first instance of consecutive 12 '1' at flg(i).
I was thinking of doing an array to populate out position of consecutive 12 (eg nc_pos) then do i to nc_pos - something along the lines of
nc_pos = <search for 12 consecutive occurrence of '1' from flg(i)> **I don't know the code for this**
if flg(i) = 0 then do i to nc_pos;
noncure_tag = 1;
obs_pos = i;
FYI I have few hundred thousand accounts with a total of 84 months and their starting positions are different (eg flg1 could be null and the first 0 or 1 may appear at flg3).
My final output should look something like the image file labelled TARGET highlighted in yellow.

Related

How to add bins created by xtabs with zero values

I'm creating frequency tables of my data to then compare. however, further comparisons are not possible because my frequencies created by Xtabs are resulting in different lengths. How do I force a specified length on Xtabs that will fill in the missing bins with zeros rather than leave them out. For example, using xtabs(~lcat20+AgeBin1, data = datasource), my result was:
AgeBin1
lcat20 0 1 2 3 4
100 1 0 0 0 0
160 5 1 0 0 0
180 2 3 0 0 0
200 1 2 0 0 0
lcat20=120 and lcat20=140 are missing because that dataset does not have any samples in those sizes. How do I fill in these missing categories? I appreciate any help.

Lag function in SAS for checking previous value

In SAS, I would like to create a label that check the previous sell indicator: if the sell indicator of the previous time period is 1/0 and in the current is 0/1 (meaning that it has changed) then I assign a value 1 to the ind variable.
The dataset looks like:
Customer Time Sell_Ind
1 2 1
1 3 0
1 4 0
2 23 0
2 24 0
2 30 0
5 12 1
5 11 0
And so on.
My expected output would be
Customer Time Sell_Ind Ind
1 2 1 0
1 3 0 1
1 4 0 0
2 23 0 0
2 24 0 0
2 30 0 0
5 12 1 0
5 11 0 1
The previous/current check is meant by customer.
I have tried as follows
data mydata;
set original;
By customer;
Lag_sell_ind=lag(sell_ind);
If first.customer then Lag_sell_ind=.;
Run;
But it does not return the expected output.
In sql I would probably use partition by customer over time but I do not know how to do the same in SAS.
You were halfway through, you only need to add one if statement to achieve the desired output.
data want;
set have;
by customer;
lag=lag(sell_ind);
if first.customer then lag=.;
if sell_ind ne lag and lag ne . then ind = 1;
else ind = 0;
drop lag;
run;
You can simplify this using the IFN Function like below.
data have;
input Customer Time Sell_Ind;
datalines;
1 2 1
1 3 0
1 4 0
2 23 0
2 24 0
2 30 0
5 12 1
5 11 0
;
data want;
set have;
by customer;
Lag_sell_ind = ifn(first.customer, 0, lag(sell_ind));
Run;

Matlab - Convert Character Vector Into Number Vector?

I'm trying to implement an oscilloscope for a digital input and send it over a serial port for debugging. I have the scope software sending Matlab a string like "000000111111111000000001111111000000". I'd like to plot this. Is there any way for me to split this string into a vector. It doesn't seem Matlab allows you to use strsplit() without a delimiter. I'd rather not bog up the communications with a delimiter between each byte.
With MATLAB's weak typing, this is actually quite easy:
>> str = '000000111111111000000001111111000000'
str = 000000111111111000000001111111000000
>> class(str)
ans = char
>> vec = str - '0'
vec =
Columns 1 through 22:
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
Columns 23 through 36:
0 1 1 1 1 1 1 1 0 0 0 0 0 0
>> class(vec)
ans = double
This subtracts the ordinal value of the character '0' from each character in the string, leaving the numerical values 0 or 1.
You can use sscanf with a single value width:
a = '000000111111111000000001111111000000'
b = sscanf(a, '%1d');
Which returns:
>> b.'
ans =
Columns 1 through 18
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0
Columns 19 through 36
0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0
A quick and fast solution is:
data = '000001111111110000000000111111111110000000';
vec = str2double(cellstr(data.').');
It will produce a column vector of numeric values. If you want a row vector as output, just use a single transpose:
vec = str2double(cellstr(data.'));
I'm surprised how difficult this is to do. But here's what I came up with:
str = '000001111111110000000000111111111110000000'; %test string
y = cellfun(#(x) str2num(x), regexp(str,'\d','match'));
plot(y);
regexp() seems to be the only way to go. By default, it return indexes of matches so you need to specify 'match'. Then you end up with a cell array of strings. The only good way to convert this into a numerical array is one item at a time with str2num().
I hope this helps someone else out who is assuming there is a straight forward function as I assumed. And if anyone knows a way to do this without converting my "01...01....01....01....00....00....00....00" stream of bytes into the ascii representations of the binary numbers: "49.....49.....49....49....48....48....48....48", I'd love to hear it.

How to fix a bug in my homework solution in C++?

I need to write a program which reads the statistics of n League A football teams and prints the teams name which fall in League B.
A team falls in League B, if it has less than k points after having played m weeks where m is between 1 and 150. Each team gets three points for a win, one point for draw and zero points when lost.
Input Specification: In the first line, you will be given the number of teams 0 < n ≤ 500 and the points 0 < k ≤ 300 needed to stay in league A. Then in the following n lines, there will be the team name and its results. Semicolon indicates the end of input series.
Number 2 represents win, number one represents draw and number zero represents loss.
Output specification:
Sample Input I
4 19
Team_A 1 1 1 1 1 1 1 1 1 0 1 1 1 0 2 1 0 ;
Team_B 0 1 0 2 2 1 1 0 1 1 0 2 0 1 0 0 2 ;
Team_C 0 0 1 0 2 2 2 1 1 1 1 1 0 0 2 1 2 ;
Team_D 0 1 0 1 2 1 2 1 0 0 0 2 2 2 0 0 0 ;
Sample Output I
Team_A 16
Team_B 18
This is the code I came up with, but the output is wrong and I don't know why,
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
int n,points,sum=0,i,value;
char name[15];
char p;
scanf("%d %d",&n,&points);
for(i=1;i<=n;i++)
{
scanf("%s",&name);
do
{
scanf("%c ",&p);
if(p!=';')
{
value=p-48;
sum=sum+value;
}
}while(p!=';');
if(sum<=points)
printf("%s %d",name,sum);
}
return 0;
}
You might look for problems by stuffing the program with output statements.
If you add after scanf("%c ",&p); an output statement to show the value of p, you will find that the first value for p is a space character, which spoils your calculation.
In the same way, if you trace the value of value, you will find that you forgot to initialize this variable to zero for each team.

replace missing value with non-zero values by column

Data:
A B C D E
2 3 4 . .
2 3 0 0 .
0 3 4 1 1
0 . 4 0 1
2 0 0 0 1
Ideal output:
A B C D E
2 3 4 1 1
2 3 0 0 1
0 3 4 1 1
0 3 4 0 1
2 0 0 0 1
For each column, there are only 3 possible values: an arbitrary integer, zero, and missing value.
I want to replace the missing values with the non-zero value in the corresponding column.
If the arbitrary integer is zero, then missing value should be replaced by zero.
For actual problem, the number of row and number of columns are not small.
Make two arrays--one with your column names and another with variables to hold the arbitrary integers. Loop through the data set once to get the integers (looping over the columns in the array), then again to output the values, replacing where necessary (again, looping through the columns in the array).
data want(drop=i int1-int5);
do until (eof);
set have end=eof;
array _col a--e;
array _int int1-int5;
do i = 1 to dim(_col);
if _col(i) not in (.,0) then _int(i)=_col(i);
end;
end;
do until (_eof);
set have end=_eof;
do i = 1 to dim(_col);
if missing(_col(i)) then _col(i)=_int(i);
end;
output;
end;
run;