Generating Unique ID for same group - sas

I have data set,
CustID Rating
1 A
1 A
1 B
2 A
2 B
2 C
2 D
3 X
3 X
3 Z
4 Y
4 Y
5 M
6 N
7 O
8 U
8 T
8 U
And expecting Output
CustID Rating ID
1 A 1
1 A 1
1 B 1
2 A 1
2 B 2
2 C 3
2 D 4
3 X 1
3 X 1
3 Z 2
4 Y 1
4 Y 1
5 M 1
6 N 1
7 O 1
8 U 1
8 T 2
8 U 1

In the solution below, I selected the distinct possible ratings into a macro variable to be used in an array statement. These distinct values are then searched in the ratings tolumn to return the number assigned at each successful find.
You can avoid the macro statement in this case by replacing the %sysfunc by 3 (the number of distinct ratings, if you know it before hand). But the %sysfunc statement helps resolve this in case you don't know.
data have;
input CustomerID Rating $;
cards;
1 A
1 A
1 B
2 A
2 A
3 A
3 A
3 B
3 C
;
run;
proc sql noprint;
select distinct quote(strip(rating)) into :list separated by ' '
from have
order by 1;
%put &list.;
quit;
If you know the number before hand:
data want;
set have;
array num(3) $ _temporary_ (&list.);
do i = 1 to dim(num);
if findw(rating,num(i),'tips')>0 then id = i;
end;
drop i;
run;
Otherwise:
%macro Y;
data want;
set have;
array num(%sysfunc(countw(&list., %str( )))) $ _temporary_ (&list.);
do i = 1 to dim(num);
if findw(rating,num(i),'tips')>0 then id = i;
end;
drop i;
run;
%mend;
%Y;
The output:
Obs CustomerID Rating id
1 1 A 1
2 1 A 1
3 1 B 2
4 2 A 1
5 2 A 1
6 3 A 1
7 3 A 1
8 3 B 2
9 3 C 3

Assuming data is sorted by customerid and rating (as in the original unedited question). Is the following what you want:
data want;
set have;
by customerid rating;
if first.customerid then
id = 0;
if first.rating then
id + 1;
run;

Related

SAS - Split single column into two based on value of an ID column

I have data which is as follows.
data have;
input group replicate $ sex $ count;
datalines;
1 A F 3
1 A M 2
1 B F 4
1 B M 2
1 C F 4
1 C M 5
2 A F 5
2 A M 4
2 B F 6
2 B M 3
2 C F 2
2 C M 2
3 A F 5
3 A M 1
3 B F 3
3 B M 4
3 C F 3
3 C M 1
;
run;
I want to break the count column into two separate columns based on gender.
count_ count_
Obs group replicate female male
1 1 A 3 2
2 1 B 4 2
3 1 C 4 5
4 2 A 5 4
5 2 B 6 3
6 2 C 2 2
7 3 A 5 1
8 3 B 3 4
9 3 C 3 1
This can be done by first creating two separate data sets for each level of sex and then performing a merge.
data just_female;
set have;
where sex = 'F';
rename count = count_female;
run;
data just_male;
set have;
where sex = 'M';
rename count = count_male;
run;
data want;
merge
just_female
just_male
;
by
group
replicate
;
keep
group
replicate
count_female
count_male
;
run;
Is there a less verbose way to do this which doesn't require the need to sort or explicitly drop/keep variables?
You can do this using proc transpose but you will need to sort the data. I believe this is what you're looking for though.
proc sort data=have;
by group replicate;
run;
The data is sorted so now you have your by-group for transposing.
proc transpose data=have out=want(drop=_name_) prefix=count_;
by group replicate;
id sex;
var count;
run;
proc print data=want;
Then you get:
Obs group replicate count_F count_M
1 1 A 3 2
2 1 B 4 2
3 1 C 4 5
4 2 A 5 4
5 2 B 6 3
6 2 C 2 2
7 3 A 5 1
8 3 B 3 4
9 3 C 3 1

Using a sas lookup table when the column number changes

I have two sas datasets,
Table 1 Table 2
col1 col2 col3 col4 col5 a b
. 1 2 3 4 1 1
1 5 8 6 1 1 4
2 5 9 7 1 4 3
3 6 9 7 1 2 1
4 6 9 7 2 2 2
where table 1 is a lookup table for values a and b in table 2, such that I can make a column c. In table 1 a is equivalent to col1 and b to row1 (i.e. the new column c in table 2 should read 5,1,7,5,9. How can I achieve this in sas. I was thinking of reading table 1 into a 2d array then get column c = array(a,b), but can't get it to work
Here's an IML solution, first, as I think this is really the 'best' solution for you - you're using a matrix, so use the matrix language. I'm not sure if there's a non-loop method - there may well be; if you want to find out, I would add the sas-iml tag to the question and see if Rick Wicklin happens by the question.
data table1;
input col1 col2 col3 col4 col5 ;
datalines;
. 1 2 3 4
1 5 8 6 1
2 5 9 7 1
3 6 9 7 1
4 6 9 7 2
;;;;
run;
data table2;
input a b;
datalines;
1 1
1 4
4 3
2 1
2 2
;;;;
run;
proc iml;
use table1;
read all var _ALL_ into table1[colname=varnames1];
use table2;
read all var _ALL_ into table2[colname=varnames2];
print table1;
print table2;
table3 = j(nrow(table2),3);
table3[,1:2] = table2;
do _i = 1 to nrow(table3);
table3[_i,3] = table1[table3[_i,1]+1,table3[_i,2]+1];
end;
print table3;
quit;
Here is the temporary array solution. It's not all that pretty. If speed is an issue you don't have to loop over the array to insert it, you can use direct memory access, but I don't want to do that unless speed is a huge issue (and if it is, you should use a better data structure first).
data table3;
set table2;
array _table1[4,4] _temporary_;
if _n_ = 1 then do;
do _i = 1 by 1 until (eof);
set table1(firstobs=2) nobs=_nrows end=eof;
array _cols col2-col5;
do _j = 1 to dim(_cols);
_table1[_i,_j] = _cols[_j];
end;
end;
end;
c = _table1[a,b];
keep a b c;
run;
Just use the POINT= option on a SET statement to pick the row. You can then use an ARRAY to pick the column.
data table1 ;
input col1-col4 ;
cards;
5 8 6 1
5 9 7 1
6 9 7 1
6 9 7 2
;
data table2 ;
input a b ;
cards;
1 1
1 4
4 3
2 1
2 2
;
data want ;
set table2 ;
p=a ;
set table1 point=p ;
array col col1-col4 ;
c=col(b);
drop col1-col4;
run;

How to join multiple columns into one in sas

I have a time series SAS dataset and I want to transfer it to vertical dataset.
My data looks like..
ID A2009 A2010 A2011 A2012
1 1 2 3 4
2 1 2 3 4
3 1 2 3 4
4 1 2 3 4
5 1 2 3 4
data multcol;
infile datalines;
input ID A2009 A2010 A2011 A2012 A2013;
return;
datalines;
1 1 2 3 4 5
2 1 2 3 4 5
3 1 2 3 4 5
4 1 2 3 4 5
5 1 2 3 4 5
;
run;
proc print data=multcol noobs;
run;
I search the web only find someone's solution as following.Not worked.
But my dataset is too large, this method shut down my computer.
data cmbcol(keep=a orig_varname orig_obsnum);
set multcol;
array myvars _numeric_;
do i = 2 to dim(myvars);
orig_varname = vname(myvars(i));
orig_obsnum = _n_;
A = myvars(i);
output;
end;
run;
proc print data=cmbcol ;
title 'cmbcol';
run;
proc sort data=cmbcol;
by orig_varname a;
run;
proc print data=cmbcol noobs;
title 'cmbcol';
run;
And I want them to become like this.
ID t t+1
1 1 2
2 1 2
3 1 2
4 1 2
5 1 2
1 2 3
2 2 3
3 2 3
4 2 3
5 2 3
1 3 4
2 3 4
3 3 4
4 3 4
5 3 4
How can we do that?
Thanks in advance.
That is an unusual data structure for sure, but you could achieve this using the following macro (adjust to your needs).
options validvarname = any;
%macro transp;
%let i = 2009;
%do %while (&i <= 2011);
%let j = %eval(&i + 1);
data part_&i(rename = (A&i = t A&j = 't+1'n));
set multcol(keep = ID A&i A&j);
run;
%let i = %eval(&i + 1);
%end;
data combined;
set part_:;
run;
proc datasets nolist nodetails;
delete part_:;
quit;
%mend transp;
%transp

SAS identify combinations of two variables using count

I have the following dataset
data input;
input Row$ A B;
datalines;
1 1 2
2 1 2
3 1 1
4 1 1
5 2 3
6 2 3
7 2 3
8 2 2
9 2 2
10 2 1
;
run;
My goal is only to keep records of the first group of data for the variable A. For example I only want records where A=1 and B=2 (lines 1 and 2) and for the next group where A=2 and B=3 and so on...
I tried the following code
data input (rename= (count=rank_b));
set input;
count + 1;
by A descending B;
if first.B then count = 1;
run;
which just gives the number of observations in A (1 to 4) and B (1 to 6). What I would like is
A B rank_b rank_b_desired
1 2 1 1
1 2 2 1
1 1 1 2
1 1 2 2
2 3 1 1
2 3 2 1
2 2 1 2
2 2 2 2
2 1 1 3
So that I can then eliminate all obs where rank_b_desired does not equal 1.
Set a flag to 1 when you encounter a new value of A, then set it to 0 if B changes. retain will preserve the value of the flag when a new line is read from the input.
data want;
set input;
by A descending B;
retain flag;
if first.B then flag = 0;
if first.A then flag = 1;
run;
The desired result can also be achieved via proc sql, with the added benefit that it does not depend on the data being pre sorted.
proc sql;
create table want as
select *
from input
group by A
having B = max(B)
order by Row;
quit;
Or to match user234821's output:
proc sql;
create table want as
select
*,
ifn(B = max(B), 1, 0) as flag
from input
group by A
order by Row;
quit;

How to show variable value by proc tabulate in sas?

How can I manage proc tabulate to show the value of a variable with missing value instead of its statistic? Thanks!
For example, I want to show the value of sym. It takes value 'x' or missing value. How can I do it?
Sample code:
data test;
input tx mod bm $ yr sym $;
datalines;
1 1 a 0 x
1 2 a 0 x
1 3 a 0 x
2 1 a 0 x
2 2 a 0 x
2 3 a 0 x
3 1 a 0
3 2 a 0
3 3 a 0 x
1 1 b 0 x
1 2 b 0
1 3 b 0
1 4 b 0
1 5 b 0
2 1 b 0
2 2 b 0
2 3 b 0
2 4 b 0
2 5 b 0
3 1 b 0 x
3 2 b 0
3 3 b 0
1 1 c 0
1 2 c 0 x
1 3 c 0
2 1 c 0
2 2 c 0
2 3 c 0
3 1 c 0
3 2 c 0
3 3 c 0
1 3 a 1 x
2 3 a 1
3 3 a 1
1 3 b 1
2 3 b 1
3 3 b 1
1 3 c 1 x
2 3 c 1
3 3 c 1
;
run;
proc tabulate data=test;
class yr bm tx mod ;
var sym;
table yr*bm, tx*mod;
run;
proc tabulate data=test;
class tx mod bm yr sym;
table yr*bm, tx*mod*sym*n;
run;
That gives you ones for each SYM=x (since n=missing). That hides the rows for SYM=missing, hence you miss some values overall from your example table. (You could format the column with a format that defines 1 = 'x' easily).
proc tabulate data=test;
class tx mod bm yr;
class sym /missing;
table yr*bm, tx*mod*sym=' '*n;
run;
That gives you all of your combinations of the 4 main variables, but includes missing syms as their own column.
If you want to have your cake and eat it too, then you need to redefine SYM to be a numeric variable, so you can use it as a VAR.
proc format;
invalue ISYM
x=1
;
value FSYM
1='x';
quit;
data test;
infile datalines truncover;
input tx mod bm $ yr sym :ISYM.;
format sym FSYM.;
datalines;
1 1 a 0 x
1 2 a 0 x
1 3 a 0 x
... more lines ...
;
run;
proc tabulate data=test;
class tx mod bm yr;
var sym;
table yr*bm, tx*mod*sym*sum*f=FSYM.;
run;
All of these assume these are unique combination rows. If you start having multiples of yr*bm*tx*mod, you would have a problem here as this wouldn't give you the expected result (sum 1+1+1=3 would not give you an 'x').