SAS - How to make dataset wide to long when some values are missing? - sas

I have a dataset that looks basically like this:
LOCID
Name
Addtl Loc 1
Addtl Loc 2
Addtl Loc 3
1
A
2
3
5
1
B
2
1
C
2
4
And I would like to make it look like this:
LOCID
Name
Gender
1
A
F
2
A
F
3
A
F
5
A
F
1
B
M
2
B
M
1
C
F
2
C
F
4
C
F
So, I'd like to keep the attributes for each person but have a row for each of their locations. I also don't currently have a unique ID or any variable to identify each of the people but I could make one. I'm working in SAS. Does anyone have suggestions on how to do this?
I have been looking up wide to long methods but am having trouble understanding them.

It looks to me like you could just use a DO LOOP to transpose the data.
So assuming your input data set has LOCID and ADD_LOCID1 to ADD_LOCID3 plus any other variables, such as NAME and GENDER, you could just do the following to add an extra observation for every non-missing value found in the extra locid variables.
data want;
set have;
array list add_locid1 - add_locid3;
output;
do index=1 to dim(list);
locid = list[index];
if not missing(locid) then output;
end;
drop index add_locid1-add_locid3 ;
run;

Related

Using a counter to find multiple occurences on same day from ID & date

I am trying find when a person has multiple occurences on the same day & when they do not.
My data looks something like this
data have;
input id date ;
datalines ;
1 nov10
1 nov15
2 nov11
2 nov11
2 nov14
3 nov12
4 nov17
4 nov19
4 nov19
etc...;
I want to create a new variable to show when an occurence happens on the same day or not. I want my end rseult to look like
data want;
input id date occ;
1 nov10 1
1 nov15 1
2 nov11 2
2 nov11 2
2 nov14 1
3 nov12 1
4 nov17 1
4 nov19 2
4 nov19 2
etc...;
THis is what I tried but it is not working for each date instead only doing it if the date repeats on the first. Here is my code
data want ;
set have ;
by id date;
if first.date then occ = 1;
else occ = 2;
run;
Your IF/THEN logic is just a complicated way to do
occ = 1 + not first.date;
Which is just a test of whether or not it is the first observation for this date.
Looks like you want to instead test whether or not there are multiple observations per date.
occ = 1 + not (first.date and last.date) ;

Replacing values with the value in first row SAS

I want to replace the values with the first row by group. My data looks like this:
ID Value
A 5
A 4
A 3
B 4
B 3
C 4
And I want the final data looks like this: each ID have the same value as the first one.
ID Value
A 5
A 5
A 5
B 4
B 4
C 4
How should I write the code? Thank you so much!
Try this
data have;
input ID $ value;
datalines;
A 5
A 4
A 3
B 4
B 3
C 4
;
data want;
set have;
by ID;
if first.ID then _iorc_ = value;
else value = _iorc_;
run;

Calculating median across multiple rows and columns in SAS 9.4

I tried searching multiple places but have not been able to find a solution yet. I was wondering if someone here would be able to please help me?
I am trying to calculate a median value (with Q1 and Q3) across multiple rows and columns in SAS 9.4 The dataset I am working with looks like the following:
Obs tumor_size_1 tumor_size_2 tumor_size_3 tumor_size_4
1 4 1.5 1 1
2 2.5 2 . .
3 3 . . .
4 4 . . .
5 3.5 1 . .
The context is this is for a medical condition where a person may have 1 (or more) tumors. Each row represents 1 person. Each person may have up to 4 tumors. I would like to determine the median size of all tumors for the entire cohort (not just the median size for each person). Is there a way to calculate this? Thank you in advance.
A transpose of the data will yield a data structure (form) that is amenable to median and quartile computations, at a variety of aggregate combinations, made with PROC SUMMARY and a CLASS statement.
Example:
data have;
input
patient tumor_size_1 tumor_size_2 tumor_size_3 tumor_size_4; datalines;
1 4 1.5 1 1
2 2.5 2 . .
3 3 . . .
4 4 . . .
5 3.5 1 . .
;
proc transpose data=have out=new_have;
by patient;
var tumor:;
run;
proc summary data=new_have;
class patient;
var col1;
output out=want Q1=Q1 Q3=Q3 MEDIAN=MEDIAN N=N;
run;
Results
patient _TYPE_ _FREQ_ Q1 Q3 MEDIAN N
. 0 20 1 3.50 2.25 10
1 1 4 1 2.75 1.25 4
2 1 4 2 2.50 2.25 2
3 1 4 3 3.00 3.00 1
4 1 4 4 4.00 4.00 1
5 1 4 1 3.50 2.25 2
The _TYPE_ column describes the ways in which the CLASS variables are combined in order to achieve the results for the requested statistics. The _TYPE_ = 0 case is for all values, and, in this problem, the _FREQ_ = 20 indicates 20 inputs went into the computation consideration, and that N = 10 of those were non-missing and were involved in the actual computation. The role of _TYPE_ becomes more obvious when there is more than one CLASS variable.
From the Output Data Set documentation:
the variable _TYPE_ that contains information about the class variables. By default _TYPE_ is a numeric variable. If you specify CHARTYPE in the PROC statement, then _TYPE_ is a character variable. When you use more than 32 class variables, _TYPE_ is automatically a character variable.
and
The value of _TYPE_ indicates which combination of the class variables PROC MEANS uses to compute the statistics. The character value of _TYPE_ is a series of zeros and ones, where each value of one indicates an active class variable in the type. For example, with three class variables, PROC MEANS represents type 1 as 001, type 5 as 101, and so on.
A far less elegant way to compute the median of all is to store all the values in an oversized array and use the MEDIAN function on the array after the last row is read in:
data median_all;
set have end=lastrow;
array values [1000000] _temporary_;
array sizes tumor_size_1-tumor_size_4;
do sIndex = 1 to dim(sizes);
/* if not missing (sizes[sIndex]) then do; */ %* decomment for dense fill;
vIndex + 1;
values[vIndex] = sizes[sIndex];
/* end; */ %* decomment for dense fill;
end;
if lastrow then do;
median_all_tumor_sizes = median (of values(*));
output;
put (median:) (=);
end;
keep median:;
run;
-------- LOG -------
median_all_tumor_sizes=2.25

reshape wide to long in stata but new variable contains all missing values

I need to reshape a dataset, whose original form is as follows:
schid m2s1q0_i m2s1q0_ii ... m2s1q0_x
1 6 2 3
I want to reshape it into the long format, like this:
schid teacher_id
1 5
1 2
...
1 3
I used this code:
reshape long m2s1q0_, i(schoolid) j(teacher_id)
However, the teacher_id variable is all missing. Where did it go wrong?
If you use the option string teacher ids will be generated as string variables rather than missing. You can then use encode to create numeric values for the teacher_id variable
Here is an example:
clear
set obs 10
gen schid = _n
gen m_i = 1
gen m_ii = 2
gen m_iii = 3
reshape long m_, i(schid) j(teacher_id) string
encode teacher_id, gen(teacher_id2)

SAS in data step reshape a long data into wide with categorical variable

In SAS, a dataset I have is as follows.
id A
1 2
1 3
2 1
3 1
3 2
ID is given to each individual and A is a categorical variable which takes 1, 2 or 3. I want to get the data with one observation per each individual separating A into three indicator variables, say A1, A2 and A3.
The result would look like this:
id A1 A2 A3
1 0 1 1
2 1 0 0
3 1 1 0
Does anyone have any thought how to do this in data step, not in sql? Thanks in advance.
So you're on the right track, a transpose statement is definitely the way to go:
data temp;
input id A;
datalines;
1 2
1 3
2 1
3 1
3 2
;
run;
First you want to transpose by id, using the variable A:
proc transpose data = temp
out = temp2
prefix = A;
by id;
var A;
id A;
run;
And then, for all variables beginning with A, you want to replace all missing values with 0s and all non-missing values with 1s. The retain statement here reorders your variables:
data temp3 (drop = _name_);
retain id A1 A2 A3;
set temp2;
array change A:;
do over change;
if change~=. then change=1;
if change=. then change=0;
end;
run;