I want to add a row number column to a SAS data set based on the values of two columns.
Type_1 and Type_2 columns are what I have and the Row Number is what I need.
Type_1 Type_2 Row Number
A 1 1
A 1 2
A 2 1
A 2 2
B 1 1
B 2 1
B 2 2
B 3 1
C 1 1
C 1 2
C 2 1
C 3 1
C 4 1
C 4 2
I have this code to count rows on one column value:
data work.want;
set work.have;
rownumber + 1;
by type_1 notsorted;
if first.type_1 then rownumber=1;
run;
But I don't know how to scale this to being able to group by multiple column criteria. I know that I can just concatenate type_1 and type_2 and the above code would work, but I would like to be able to do it without making a helper column. Is there any way to change the data step for it to work? Or is there another SAS function that I don't know of that can accomplish this?
If you want to reset it on any change in either TYPE_1 or TYPE_2, then just use the last variable in the list; any change in an earlier variable will trigger a change in the FIRST variable.
data work.want;
set work.have;
by type_1 type_2 notsorted;
rownumber + 1;
if first.type_2 then rownumber=1;
run;
Related
I have a dataset that looks basically like this:
LOCID
Name
Addtl Loc 1
Addtl Loc 2
Addtl Loc 3
1
A
2
3
5
1
B
2
1
C
2
4
And I would like to make it look like this:
LOCID
Name
Gender
1
A
F
2
A
F
3
A
F
5
A
F
1
B
M
2
B
M
1
C
F
2
C
F
4
C
F
So, I'd like to keep the attributes for each person but have a row for each of their locations. I also don't currently have a unique ID or any variable to identify each of the people but I could make one. I'm working in SAS. Does anyone have suggestions on how to do this?
I have been looking up wide to long methods but am having trouble understanding them.
It looks to me like you could just use a DO LOOP to transpose the data.
So assuming your input data set has LOCID and ADD_LOCID1 to ADD_LOCID3 plus any other variables, such as NAME and GENDER, you could just do the following to add an extra observation for every non-missing value found in the extra locid variables.
data want;
set have;
array list add_locid1 - add_locid3;
output;
do index=1 to dim(list);
locid = list[index];
if not missing(locid) then output;
end;
drop index add_locid1-add_locid3 ;
run;
I am attempting to group by a variable that is not unique with a discrete variable to get the unique combinations per non-unique variable. For example:
A B
1 a
1 b
2 a
2 a
3 a
4 b
4 d
5 c
5 e
I want:
A Unique_combos
1 a, b
2 a
3 a
4 b, d
5 e
My current attempt is something along the lines of:
proc sql outobs=50;
title 'Unique Combinations of b per a';
select a, b
from mylib.mydata
group by distinct a;
run;
If you are happy to use a data step instead of proc sql you can use the retain keyword combined with first/last processing:
Example data:
data have;
attrib b length=$1 format=$1. informat=$1.;
input a
b $
;
datalines;
1 a
1 b
2 a
2 a
3 a
4 b
4 d
5 c
5 e
;
run;
Eliminate duplicates and make sure the data is sorted for first/last processing:
proc sql noprint;
create table tmp as select distinct a,b from have order by a,b;
quit;
Iterate over the distinct list and concatenate the values of b together:
data want;
length combinations $200; * ADJUST TO BE BIG ENOUGH TO STORE ALL THE COMBINATIONS;
set tmp;
by a;
retain combinations '';
if first.a then do;
combinations = '';
end;
combinations = catx(', ',combinations, b);
if last.a then do;
output;
end;
drop b;
run;
Result:
combinations a
a, b 1
a 2
a 3
b, d 4
c, e 5
You just need to put a distinct keyword in the select clause, eg:
title 'Unique Combinations of b per a';
proc sql outobs=50;
select distinct a, b
from mylib.mydata;
The run statement is unnecessary, the sql procedure is normally ended with a quit - although I personally never use it, as the statement will execute upon hitting the semicolon and the procedure quits anyway upon hitting the next step boundary.
In SAS, a dataset I have is as follows.
id A
1 2
1 3
2 1
3 1
3 2
ID is given to each individual and A is a categorical variable which takes 1, 2 or 3. I want to get the data with one observation per each individual separating A into three indicator variables, say A1, A2 and A3.
The result would look like this:
id A1 A2 A3
1 0 1 1
2 1 0 0
3 1 1 0
Does anyone have any thought how to do this in data step, not in sql? Thanks in advance.
So you're on the right track, a transpose statement is definitely the way to go:
data temp;
input id A;
datalines;
1 2
1 3
2 1
3 1
3 2
;
run;
First you want to transpose by id, using the variable A:
proc transpose data = temp
out = temp2
prefix = A;
by id;
var A;
id A;
run;
And then, for all variables beginning with A, you want to replace all missing values with 0s and all non-missing values with 1s. The retain statement here reorders your variables:
data temp3 (drop = _name_);
retain id A1 A2 A3;
set temp2;
array change A:;
do over change;
if change~=. then change=1;
if change=. then change=0;
end;
run;
I have the following dataset :
ID CODE
1 A
1 B
2 A
2 A
2 B
3 A
3 B
I would like to add a third column to this table which gives a sequence no. as given below :
ID CODE SEQ
1 A 1
1 B 2
2 A 1
2 A 1
2 B 2
3 A 1
3 B 2
How can I achieve this instead of coding A as 1 and B as 2 rather by a retain statement ?
You should look at by processing and first.. Something like this will work; basically, for each ID initialize seq to zero, and for each new code increment it by one.
data want;
set have;
by id code;
if first.id then seq=0;
if first.code then seq+1;
run;
I have two datasets of the following structure
ID1 Cat1
1 a
2 a
3 b
5 b
5 b
6 c
7 d
and
ID2 Cat2
11 z
12 z
13 z
14 y
15 x
I want to column-combine then and then have the unmatched rows just be missing. So ultimately I want:
ID1 Cat1 ID2 Cat2
1 a 11 z
2 a 12 z
3 b 13 z
4 b 14 y
5 b 15 x
6 c
7 d
The purpose of this is that I have two sorted datasets (by ID) and want to do a matching of the first category (Cat1) with the second (Cat2). The second category has a predefined number of "slots" and those slots should be matched on the order of the IDs. The only relationship between ID1 and ID2 is that they are ordered the same way. So the two lowest should be a match and so on.
You want a one to one merge.
The documentation is here
In order to do a one to one merge you just need to merge without a by statement
This type of merge simply matches the observations based on its row number, so be careful, it may give you unintended results if you are missing a row you thought you had or something else wasn't as you expected.
for example:
proc sort data = have1; run;
proc sort data = have2; run;
data want;
merge have1 have2;
run;