SAS retrieving values from row - sas

I want to retrieve row level values (Loans associated with account number) from a SAS table -
Please find below example.
Input
Account Number Loans
123 abc, def, ghi
456 jkl, mnopqr, stuv
789 w, xyz
Output
Account Numbers Loans
123 abc
123 def
123 ghi
456 jkl
456 mnopqr
456 stuv
789 w
789 xyz
Loans are separated by commas and they don't have fix length.

Use countw() to count the number of values on a line and scan() to pick them out.
Both have a last optional variable to specify the separator, which in your case is ,.
data Loans (keep= AccountNo Loan);
infile datalines truncover;
Input #1 AccountNo 3. #17 LoanList $250.;
if length(LoanList) gt 240 then put 'WARNING: You might need to extend Loans';
label AccountNo = 'Account Number' Loan = 'Loans';
do loanNo = 1 to countw(LoanList, ',');
Loan = scan(LoanList, loanNo, ',');
output;
end;
datalines;
123 abc, def, ghi
456 jkl, mnopqr, stuv
789 w, xyz
;
proc print data=Loans label noobs;
run;
The reverse operation requires different techniques.
To enable by AccountNo processing, we must first construct a SAS dataset from the input and then read that back in with a set statement.
data Loans;
infile datalines;
input #1 AccountNo 3. #5 Loan $25.;
datalines;
123 15-abc
123 15-def
123 15-ghi
456 99-jkl
456 99-mnopqr
456 99-stuv
789 77-w
789 77-xyz
;
data LoanLists;
set Loans;
by AccountNo;
Now create your Loanlist long enough and overwrite the default behaviour of SAS to re-initialise all variables for every observation (=row of data).
format Loanlist $250.;
retain Loanlist;
Collect all loans for an account, separating them with comma an blank.
if first.AccountNo then Loanlist = Loan;
else Loanlist = catx(', ',Loanlist,Loan);
if length(LoanList) gt 240 then put 'WARNING: you might need to extend LoanList';
Keep only the full list per account.
if last.AccountNo;
drop Loan;
proc print;
run;

Related

SAS flag each row that contains the max value

I tried searching but couldn't exactly find what I was looking for. I have a dataset with multiple rows per ID. I'd like to add a variable called maxdec and show a 1 for each row that has the max dec for each ID.
Sample Dataset:
ID DEC
123 1
123 2
123 2
123 2
456 2
456 3
456 3
Desired Output:
ID DEC MAXDEC
123 1 .
123 2 1
123 2 1
123 2 1
456 2 .
456 2 .
456 3 1
It is easier to define it with 1 or 0 instead of 1 or missing.
proc sql;
create table want as
select id,dec, dec=max(dec) as maxdec
from have
group by id
;
quit;
proc sort data=have;
by id;
proc summary data=have;
class id;
var dec;
output out=max_info max=max_value;
run;
data want;
merge have
max_info (keep=id max_value)
;
by id;
if dec=max_value then maxdec=1;
run;
The proc summary calculates the maximum value of DEC for each ID, and outputs as variable MAX_VALUE in dataset MAX_INFO. The subsequent data step assigns MAXDEC=1 if the current value of DEC is equal to MAX_VALUE for that ID.
Here is a DoW loop approach
data have;
input ID DEC;
datalines;
123 1
123 2
123 2
123 2
456 2
456 3
456 3
;
data want(drop = m);
do _N_ = 1 by 1 until (last.id);
set have;
by id;
m = max(maxdex, dec);
end;
do _N_ = 1 to _N_;
set have;
maxdex = ifn(dec = m, 1, .);
output;
end;
run;

Selecting first group of rows in case second group is duplicate

I have data like below (Dataset name - Have)
ID NAME AMOUNT PREFER
ABC Test1 123 Pref1
ABC Test1 456 Pref1
ABC Test1 789 Pref1
ABC Test1 123 Pref2
ABC Test1 456 Pref2
ABC Test1 789 Pref2
and i want First Group only as output
ID NAME AMOUNT PREFER
ABC Test1 123 Pref1
ABC Test1 456 Pref1
ABC Test1 789 Pref1
Tried so far. simple data step like
Data want;
set have;
by ID PREFER;
if first.PREFER;
run;
This will give me
ID NAME AMOUNT PREFER
ABC Test1 123 Pref1
ABC Test1 123 Pref2
Please suggest something in Data Step or Proc SQL
It sounds as though you probably want something like this:
data have;
input ID $ NAME $ AMOUNT PREFER $;
cards;
ABC Test1 123 Pref1
ABC Test1 456 Pref1
ABC Test1 789 Pref1
ABC Test1 123 Pref2
ABC Test1 456 Pref2
ABC Test1 789 Pref2
run;
data want;
set have;
by id;
retain t_prefer;
if first.id then t_prefer = prefer;
if prefer = t_prefer;
drop t_prefer;
run;
The trick is to use a retain statement so that a copy of the value of prefer from the first row per id is carried over between iterations of the data step, and you can then output only rows with that value of prefer.
You could just keep track of which group number the current record is in.
data want;
set have;
by ID PREFER;
if first.id then group=0;
group+first.prefer;
if group=1;
run;

count number of times a value appears in the entire dataset SAS

I am trying to count number of times all the values appear in the entire dataset. So I want a table/output with values - # of times it appears in the dataset. I have used proc sql, proc freq without any luck.
data Data1;
input xx yy zz;
datalines;
123 456 234
456 123 345
234 345 123
;
run;
Want a table output with 123 - 3, 234 - 2, etc.
The easiest option (I think) is to create a dataset that puts all the values in a single column, then you can just run a proc freq off that.
data have;
input xx yy zz;
datalines;
123 456 456
456 123 234
234 234 123
;
run;
data single_column;
set have;
array vars{*} xx yy zz;
do i = 1 to dim(vars);
all_vals = vars{i};
output;
end;
keep all_vals;
run;
proc freq data=single_column;
table all_vals / out=want;
run;

Transform numbers with 0 values at the beginning

I have the following dataset:
DATA survey;
INPUT zip_code number;
DATALINES;
1212 12
1213 23
1214 23
;
PROC PRINT; RUN;
I want to link this data to another table but the thing is that the numbers in the other table are stored in the following format: 0012, 0023, 0023.
So I am looking for a way to do the following:
Check how long the number is
If length = 1, add 3 0 values to the beginning
If length = 2, add 2 0 values to the beginning
Any thoughts on how I can get this working?
Numbers are numbers so if the other table has the field as a number then you don't need to do anything. 13 = 0013 = 13.00 = ....
If the other table actually has a character variable then you need to convert one or the other.
char_number = put(number, Z4.);
number = input(char_number, 4.);
You can use z#. formats to accomplish this:
DATA survey;
INPUT zip_code number;
DATALINES;
1212 12
1213 23
1214 23
9999 999
8888 8
;
data survey2;
set survey;
number_long = put(number, z4.);
run;
If you need it to be four characters long, then you could do it like this:
want = put(input(number,best32.),z4.);

Transforming levels of one variable into other variables

I have a dataset that looks something like this:
IDnum State Product Consumption
123 MI A 30
123 MI B 20
123 MI C 45
456 NJ A 15
456 NJ D 10
789 MI B 60
... ... ... ...
And i would like to create a new dataset, where i have one row for each IDnum, and a new dummy variable for each different product (in my real dataset i have close to 1000 products), along with it's associated consumption. It would look like something in these lines
IDnum State Prod.A Cons.A Prod.B Cons.B Prod.C Cons.C Prod.D Cons.D
123 MI yes 30 yes 20 yes 45 no -
456 NJ yes 15 no - no - yes 10
789 MI no - yes 60 no - no -
... ... ... ... ... ... ... ... ... ...
Some variables like "State" doesn't change within the same IDnum, but each row in the original bank are equivalent to one purchase, hence the change in the "product" and "consumption" variables for the same IDnum. I would like that my new dataset showed all the consumption habits of each costumer in one single row, but so far i have failed.
Any help would be greatly apreciated.
Without yes/no variables, it's really easy:
data input;
length State $2 Product $1;
input IDnum State Product Consumption;
cards;
123 MI A 30
123 MI B 20
123 MI C 45
456 NJ A 15
456 NJ D 10
789 MI B 60
;
run;
proc transpose data=input out=output(drop=_NAME_) prefix=Cons_;
var Consumption;
id Product;
by IDnum State;
run;
Adding the yes/no fields:
proc sql;/* from column names or alternatively
create it from source data directly if not taking too long */
create table work.products as
select scan(name, 2, '_') as product length=1
from dictionary.columns
where libname='WORK' and memname='OUTPUT'
and upcase(name) like 'CONS_%';
quit;
filename vars temp;/* write a temp file containing variable definitions
in desired order */
data _null_;
set work.products end=last;
file vars;
length str $40;
if _N_ = 1 then put 'LENGTH ';
str = catt('Prod_', product, ' $3');
put str;
str = catt('Cons_', product, ' 8');
put str;
if last then put ';';
run;
options source2;
data output2;
length IdNum 8 State $2;
%include vars;
set output;
array prod{*} Prod_:;
array cons{*} Cons_:;
drop i;
do i=1 to dim(prod);
if coalesce(cons(i), 0) ne 0 then prod(i)='yes';
else prod(i)='no';
end;
run;