Formatting a variable - sas

I have the following format:
value agecf 0 = "35-40" 1 = "41-45" 2 = "46-50" 3 = "51-55" 4 = "56-60";
But then I type the following: format age agecf.; I still get all of the observations (e.g. 35,36,37,.....) instead of the observations grouped into 5 levels. Why?

You just reversed the left and right sides of the format. The formatted value goes on the right, the original value on the left.
Below is an example with using your format and one which is probably what you were trying to create.
proc format;
value agecf 0 = "35-40" 1 = "41-45" 2 = "46-50" 3 = "51-55" 4 = "56-60";
value newage 35-40="0" 41-45="1" 46-50="2" 51-55="3" 56-60="4";
run;
data test;
input value1;
value2=value1;
format value1 agecf. value2 newage.;
datalines;
35
45
50
37
46
55
60
;
proc print data=test;run;

Related

How to write a foreach loop statement in SAS?

I'm working in SAS as a novice. I have two datasets:
Dataset1
Unique ID
ColumnA
1
15
1
39
2
20
3
10
Dataset2
Unique ID
ColumnB
1
40
2
55
2
10
For each UniqueID, I want to subtract all values of ColumnB by each value of ColumnA. And I would like to create a NewColumn that is 1 anytime 1>ColumnB-Column >30. For the first row of Dataset 1, where UniqueID= 1, I would want SAS to go through all the rows in Dataset 2 that also have a UniqueID = 1 and determine if there is any rows in Dataset 2 where the difference between ColumnB and ColumnA is greater than 1 or less than 30. For the first row of Dataset 1 the NewColumn should be assigned a value of 1 because 40 - 15 = 25. For the second row of Dataset 1 the NewColumn should be assigned a value of 0 because 40 - 39 = 1 (which is not greater than 1). For the third row of Dataset 1, I again want SAS to go through every row of ColumnB in Dataset 2 that has the same UniqueID as in Dataset1, so 55 - 20 = 35 (which is greater than 30) but NewColumn would still be assigned a value of 1 because (moving to row 3 of Datatset 2 which has UniqueID =2) 20 - 10 = 10 which satisfies the if statement.
So I want my output to be:
Unique ID
ColumnA
NewColumn
1
15
1
1
30
0
2
20
1
I have tried concatenating Dataset1 and Dataset2 into a FullDataset. Then I tried using a do loop statement but I can't figure out how to do the loop for each value of UniqueID. I tried using BY but that of course produces an error because that is only used for increments.
DATA FullDataset;
set Dataset1 Dataset2; /*Concatenate datasets*/
do i=ColumnB-ColumnA by UniqueID;
if 1<ColumnB-ColumnA<30 then NewColumn=1;
output;
end;
RUN;
I know I'm probably way off but any help would be appreciated. Thank you!
So, the way that answers your question most directly is the keyed set. This isn't necessarily how I'd do this, but it is fairly simple to understand (as opposed to a hash table, which is what I'd use, or a SQL join, probably what most people would use). This does exactly what you say: grabs a row of A, says for each matching row of B check a condition. It requires having an index on the datasets (well, at least on the B dataset).
data colA(index=(id));
input ID ColumnA;
datalines;
1 15
1 39
2 20
3 10
;;;;
data colB(index=(id));
input ID ColumnB;
datalines;
1 40
2 55
2 30
;;;;
run;
data want;
*base: the colA dataset - you want to iterate through that once per row;
set colA;
*now, loop while the check variable shows 0 (match found);
do while (_iorc_ = 0);
*bring in other dataset using ID as key;
set colB key=ID ;
* check to see if it matches your requirement, and also only check when _IORC_ is 0;
if _IORC_ eq 0 and 1 lt ColumnB-ColumnA lt 30 then result=1;
* This is just to show you what is going on, can remove;
put _all_;
end;
*reset things for next pass;
_ERROR_=0;
_IORC_=0;
run;

How to add a column of repeated numbers in SAS?

How to generate a repeating series of numbers in a column in SAS, from 1 to x?
Suppose x is 3.
Data is like:
name age
A 15
D 16
C 21
B 35
E 79
F 85
G 64
and I want to add a column named list, like this:
name age list
A 15 1
D 16 2
C 21 3
B 35 1
E 79 2
F 85 3
G 64 1
data class;
set sashelp.class;
if list>=3 then list=0;
list+1;
run;
Easiest way I can think of is to use mod and the iteration counter.
data want;
set have;
list = 1 + mod(_N_ - 1,3);
run;
mod is the modulo function (gives the remainder after dividing).
So if you want that to vary based on some parameter, well, change the 3 to a parameter.
%let num_atwork = 2;
data want;
set have;
list = 1 + mod(_N_ - 1, &num_atwork.);
run;

SAS: Condense separate measurement variables across category

I have a data set whose variables represent two kinds of information: a variable measurement and a category.
For instance, Var1A measures the first variable (eg. blood pressure) of Category A (eg. male/female) whereas Var2B measures the second variable (eg. heart rate) of Category B (eg. male/female).
Key Var1A Var2A Var1B Var2B
--- ----- ----- ----- -----
002 1 2 3 4
031 5 6 7 8
028 9 10 11 12
I need each measurement variable to be condensed across the category type.
Key Type Var1 Var2
--- ---- ---- ----
002 A 1 2
002 B 3 4
028 A 9 10
028 B 11 12
031 A 5 6
031 B 7 8
The sorting of the condensed data set is unimportant to me.
What I have come up with works and yields the data sets seen above. I basically brute forced/fiddled my way to this solution. However, I wonder if there is a more direct/intuitive way to do it, possibly without needing to sort first and drop so many variables.
data have;
input key $ ## Var1A Var2A Var1B Var2B;
datalines;
002 1 2 3 4
031 5 6 7 8
028 9 10 11 12
;
run;
proc sort data = have out = step1_sort;
by key;
run;
proc transpose data = step1_sort out = step2_transpose;
by key;
run;
data step3_assign_type_and_variable (drop = _NAME_);
set step2_transpose ;
if _NAME_ = 'Var1A' then do;
variable = 'Var1';
type = 'A';
end;
else if _NAME_ = 'Var1B' then do;
variable = 'Var1';
type = 'B';
end;
else if _NAME_ = 'Var2A' then do;
variable = 'Var2';
type = 'A';
end;
else if _NAME_ = 'Var2B' then do;
variable = 'Var2';
type = 'B';
end;
run;
proc transpose data = step3_assign_type_and_variable
out = step4_get_want (drop = _NAME_);
var col1;
by key type;
id variable;
run;
I came up with the same method except replacing your brute force with cleaner substrings:
** use this step to replace your brute force code **;
data step3_assign_type_and_variable; set step2_transpose;
type = upcase(substr(_name_,length(_name_),1));
variable = propcase(substr(_name_,1,4));
drop _name_;
run;

Transform numbers with 0 values at the beginning

I have the following dataset:
DATA survey;
INPUT zip_code number;
DATALINES;
1212 12
1213 23
1214 23
;
PROC PRINT; RUN;
I want to link this data to another table but the thing is that the numbers in the other table are stored in the following format: 0012, 0023, 0023.
So I am looking for a way to do the following:
Check how long the number is
If length = 1, add 3 0 values to the beginning
If length = 2, add 2 0 values to the beginning
Any thoughts on how I can get this working?
Numbers are numbers so if the other table has the field as a number then you don't need to do anything. 13 = 0013 = 13.00 = ....
If the other table actually has a character variable then you need to convert one or the other.
char_number = put(number, Z4.);
number = input(char_number, 4.);
You can use z#. formats to accomplish this:
DATA survey;
INPUT zip_code number;
DATALINES;
1212 12
1213 23
1214 23
9999 999
8888 8
;
data survey2;
set survey;
number_long = put(number, z4.);
run;
If you need it to be four characters long, then you could do it like this:
want = put(input(number,best32.),z4.);

sas recursive lag by id

I am trying to do a recursive lag in sas, the problem that I just learned is that x = lag(x) does not work in SAS.
The data I have is similar in format to this:
id date count x
a 1/1/1999 1 10
a 1/1/2000 2 .
a 1/1/2001 3 .
b 1/1/1997 1 51
b 1/1/1998 2 .
What I want is that given x for the first count, I want each successive x by id to be the lag(x) + some constant.
For example, lets say: if count > 1 then x = lag(x) + 3.
The output that I would want is:
id date count x
a 1/1/1999 1 10
a 1/1/2000 2 13
a 1/1/2001 3 16
b 1/1/1997 1 51
b 1/1/1998 2 54
Yes, the lag function in SAS requires some understanding. You should read through the documentation on it (http://support.sas.com/documentation/cdl/en/lefunctionsref/67398/HTML/default/viewer.htm#n0l66p5oqex1f2n1quuopdvtcjqb.htm)
When you have conditional statements with a lag inside the "then", I tend to use a retained variable.
data test;
input id $ date count x;
informat date anydtdte.;
format date date9.;
datalines;
a 1/1/1999 1 10
a 1/1/2000 2 .
a 1/1/2001 3 .
b 1/1/1997 1 51
b 1/1/1998 2 .
;
run;
data test(drop=last);
set test;
by id;
retain last;
if ^first.id then do;
if count > 1 then
x = last + 3;
end;
last = x;
run;