How to transpose dataset more simply - sas

I'd like to make the dataset like the below. I got it, but it’s a long program.
I think it would become more simple. If you have a good idea, please give me some advice.
This is the data.
data test;
input ID $ NO DAT1 $ TIM1 $ DAT2 $ TIM2 $;
cards;
1 1 2020/8/4 8:30 2020/8/5 8:30
1 2 2020/8/18 8:30 2020/8/19 8:30
1 3 2020/9/1 8:30 2020/9/2 8:30
1 4 2020/9/15 8:30 2020/9/16 8:30
2 1 2020/8/4 8:34 2020/8/5 8:34
2 2 2020/8/18 8:34 2020/8/19 8:34
2 3 2020/9/1 8:34 2020/9/2 8:34
2 4 2020/9/15 8:34 2020/9/16 8:34
3 1 2020/8/4 8:46 2020/8/5 8:46
3 2 2020/8/18 8:46 2020/8/19 8:46
3 3 2020/9/1 8:46 2020/9/2 8:46
3 4 2020/9/15 8:46 2020/9/16 8:46
;
run;
This is my program.
data
t1(keep = ID A1 A2 A3 A4)
t2(keep = ID B1 B2 B3 B4)
t3(keep = ID C1 C2 C3 C4)
t4(keep = ID D1 D2 D3 D4);
set test;
if NO = 1 then do;
A1 = DAT1;
A2 = TIM1;
A3 = DAT2;
A4 = TIM2;
end;
*--- cut (NO = 2, 3, 4 are same as NO = 1)--- ;
end;
if NO = 1 then output t1;
if NO = 2 then output t2;
if NO = 3 then output t3;
if NO = 4 then output t4;
run;
proc sort data = t1;by ID; run;
proc sort data = t2;by ID; run;
proc sort data = t3;by ID; run;
proc sort data = t4;by ID; run;
data test2;
merge t1 t2 t3 t4;
by ID;
run;

Since the result looks like a report use a reporting tool.
proc report data=test ;
column id no,(dat1 tim1 dat2 tim2 n) ;
define id / group width=5;
define no / across ' ' ;
define n / noprint;
run;

Tall to very wide data transformations are typically
sketchy, you put data into metadata (column names or labels) or lose referential context, or
a reporting layout for human consumption
Presuming your "as dataset like below" is accurate and you want to pivot your data in such a manner.
Way 1 - self merging subsets with renaming
You should see that the NO field is a sequence number that can be used as a BY variable when merging data sets.
Consider this example code as a template that could be the source code generation of a macro:
NO is changed name to seq for better clarity
data want;
merge
have (where=(seq=1) rename=(dat1=A1 tim1=B1 dat2=C1 tim2=D1)
have (where=(seq=2) rename=(dat1=A2 tim1=B2 dat2=C2 tim2=D2)
have (where=(seq=3) rename=(dat1=A3 tim1=B3 dat2=C3 tim2=D3)
have (where=(seq=4) rename=(dat1=A4 tim1=B4 dat2=C4 tim2=D4)
;
by id;
run;
For unknown data sets organized like the above pattern, the code generation requirements should be obvious; determine maximum seq and have the names of variables to pivot be specified (as macro parameters, in which loop over the names occurs).
Way 2 - multiple transposes
Caution, all pivoted columns will be character type and contain the formatted result of original values.
proc transpose data=have(rename=(dat1=A tim1=B dat2=C tim2=D)) out=stage1;
by id seq;
var a b c d;
run;
proc transpose data=stage1 out=want;
by id;
var col1;
id _name_ seq;
run;
Way 3 - Use array and DOW loop
* presume SEQ is indeed a unit monotonic sequence value;
data want (keep=id a1--d4);
do until (last.id);
array wide A1-A4 B1-B4 C1-C4 D1-D4;
wide [ (seq-1)*4 + 1 ] = dat1;
wide [ (seq-1)*4 + 2 ] = tim1;
wide [ (seq-1)*4 + 3 ] = dat2;
wide [ (seq-1)*4 + 4 ] = tim2;
end;
keep id A1--D4;
* format A1 A3 B1 B3 C1 C3 D1 D3 your-date-format;
* format A2 A4 ................. your-time-format;
Way 4 - change your data values to datetime
I'll leave this to esteemed others

Related

Aggregate multiple vars on different groupings in one Proc SQL query

I need to aggregate about ten different vars on different groupings using Proc SQL;
Is there a way to achieve SUM () OVER ( [ partition_by_clause ] order_by_clause) in one sql query with different partition by clauses.
I've made an example here
data have;
infile cards;
input a b c d e f;
cards;
1 2 3 4 5
2 2 4 5 6
1 4 3 4 7
3 4 4 5 8
;
run;
proc sql;
create table want as
select *,
sum a over partiton by (b,c) as a1,
sum b over partiton by (c,d) as b1
sum c over partiton by (d,e) as c1
sum d over partiton by (a,c) as d1
from have
;
quit;
I don't want to wirte multiple sql queries and grouping on different vars and calculating one var in each step.
Hope that makes sense.
Proc SQL does not implement windowing functions and thus partition syntax therein as found in other SQL implementations. You can only do partition by with passthrough SQL to a connection that allows such syntax.
You could perform such a computation in DATA step using hashes.
data have;
infile cards;
input a b c d e ;
cards;
1 2 3 4 5
2 2 4 5 6
1 4 3 4 7
3 4 4 5 8
;
run;
data want;
if 0 then set have;
length a1 b1 c1 d1 8;
declare hash a1s();
a1s.defineKey('b', 'c');
a1s.defineData('a1');
a1s.defineDone();
declare hash b1s();
b1s.defineKey('c', 'd');
b1s.defineData('b1');
b1s.defineDone();
declare hash c1s();
c1s.defineKey('d', 'e');
c1s.defineData('c1');
c1s.defineDone();
declare hash d1s();
d1s.defineKey('a', 'c');
d1s.defineData('d1');
d1s.defineDone();
do while (not end);
set have end=end;
if a1s.find() = 0 then a1+a; else a1=a; a1s.replace();
if b1s.find() = 0 then b1+b; else b1=b; b1s.replace();
if c1s.find() = 0 then c1+c; else c1=c; c1s.replace();
if d1s.find() = 0 then d1+d; else d1=d; d1s.replace();
end;
do while (not last);
set have end=last;
a1s.find();
b1s.find();
c1s.find();
d1s.find();
output;
end;
format _numeric_ 4.;
stop;
run;

Combine information into a department vector

I want to summarize a dataset by creating a vector that gives information on what departments the id is found in. For example,
data test;
input id dept $;
datalines;
1 A
1 D
1 B
1 C
2 C
3 D
4 A
5 C
5 D
;
run;
I want
id dept_vect
1 1111
2 0010
3 0001
4 1000
5 1001
The position of the elements of the dept_vect is organized alphabetically. So a '1' in the first position means that the id is found in deptartment A and a '1' in the second position means that the id is found in department B. A '0' means the id is not found in the department.
I can solve this problem using a brute force approach
proc transpose data = test out = test1(drop = _NAME_);
by id;
var dept;
run;
data test2;
set test1;
array x[4] $ col1-col4;
array d[4] $ d1-d4;
do i = 1 to 4;
if not missing(x[i]) then do;
if x[i] = 'A' then d[1] = 1;
else if x[i] = 'B' then d[2] = 1;
else if x[i] = 'C' then d[3] = 1;
else if x[i] = 'D' then d[4] = 1;
end;
else leave;
end;
do i = 1 to 4;
if missing(d[i]) then d[i] = 0;
end;
dept_id = compress(d1) || compress(d2) || compress(d3) || compress(d4);
keep id dept_id;
run;
This works but there are a couple of problems. For col4 to appear, I need at least one id to be found on all departments but that could be fixed by creating a dummy id so that id is found on all departments. But the main problem is that this code is not robust. Is there a way to code this so that it would work for any number of departments?
Add a 1 to get a count variable
Transpose using PROC TRANSPOSE
Replace missing with 0
Use CATT() to create desired results.
data have;
input id dept $;
count = 1;
datalines;
1 A
1 D
1 B
1 C
2 C
3 D
4 A
5 C
5 D
;
run;
proc transpose data=test out=wide prefix=dept;
by id;
id dept;
var count;
run;
data want;
set wide;
array _d(*) dept:;
do i=1 to dim(_d);
if missing(_d(i)) then _d(i) = 0;
end;
want = catt(of _d(*));
run;
Maybe TRANSREG can help with this.
data test;
input id dept $;
datalines;
1 A
1 D
1 B
1 C
2 C
3 D
4 A
5 C
5 D
;
run;
proc transreg;
id id;
model class(dept / zero=none);
output design out=dummy(drop=dept);
run;
proc print;
run;
proc summary nway;
class id;
output out=want(drop=_type_) max(dept:)=;
run;
proc print;
run;

SAS: Condense separate measurement variables across category

I have a data set whose variables represent two kinds of information: a variable measurement and a category.
For instance, Var1A measures the first variable (eg. blood pressure) of Category A (eg. male/female) whereas Var2B measures the second variable (eg. heart rate) of Category B (eg. male/female).
Key Var1A Var2A Var1B Var2B
--- ----- ----- ----- -----
002 1 2 3 4
031 5 6 7 8
028 9 10 11 12
I need each measurement variable to be condensed across the category type.
Key Type Var1 Var2
--- ---- ---- ----
002 A 1 2
002 B 3 4
028 A 9 10
028 B 11 12
031 A 5 6
031 B 7 8
The sorting of the condensed data set is unimportant to me.
What I have come up with works and yields the data sets seen above. I basically brute forced/fiddled my way to this solution. However, I wonder if there is a more direct/intuitive way to do it, possibly without needing to sort first and drop so many variables.
data have;
input key $ ## Var1A Var2A Var1B Var2B;
datalines;
002 1 2 3 4
031 5 6 7 8
028 9 10 11 12
;
run;
proc sort data = have out = step1_sort;
by key;
run;
proc transpose data = step1_sort out = step2_transpose;
by key;
run;
data step3_assign_type_and_variable (drop = _NAME_);
set step2_transpose ;
if _NAME_ = 'Var1A' then do;
variable = 'Var1';
type = 'A';
end;
else if _NAME_ = 'Var1B' then do;
variable = 'Var1';
type = 'B';
end;
else if _NAME_ = 'Var2A' then do;
variable = 'Var2';
type = 'A';
end;
else if _NAME_ = 'Var2B' then do;
variable = 'Var2';
type = 'B';
end;
run;
proc transpose data = step3_assign_type_and_variable
out = step4_get_want (drop = _NAME_);
var col1;
by key type;
id variable;
run;
I came up with the same method except replacing your brute force with cleaner substrings:
** use this step to replace your brute force code **;
data step3_assign_type_and_variable; set step2_transpose;
type = upcase(substr(_name_,length(_name_),1));
variable = propcase(substr(_name_,1,4));
drop _name_;
run;

SAS for following scenario (most frequent observation)

Assume I have a data-set D1 as follows:
ID ATR1 ATR2 ATR3
1 A R W
2 B T X
1 A S Y
2 C T E
3 D U I
1 T R W
2 C X X
I want to create a data-set D2 from this as follows
ID ATR1 ATR2 ATR3
1 A R W
2 C T X
3 D U I
In other words, Data-set D2 consists of unique IDs from D1. For each ID in D2, the values of ATR1-ATR3 are selected as the most frequent (of the respective variable) among the records in D1 with the same ID. For example ID = 1 in D2 has ATR1 = A (most frequent).
I have one solution which is very clumsy. I simply sort copies of the data set `D1' three times (by ID and ATR1 e.g) and remove duplicates. I later merge the three data-sets to get what I want. However, I think there might be an elegant way to do this. I have about 20 such variables in the original data-set.
Thanks
/*
read and restructure so we end up with:
id attr_id value
1 1 A
1 2 R
1 3 W
etc.
*/
data a(keep=id attr_id value);
length value $1;
array attrs_{*} $ 1 attr_1 - attr_3;
infile cards;
input id attr_1 - attr_3;
do attr_id=1 to dim(attrs_);
value = attrs_{attr_id};
output;
end;
cards;
1 A R W
2 B T X
1 A S Y
2 C T E
3 D U I
1 T R W
2 C X X
;
run;
/* calculate frequencies of values per id and attr_id */
proc freq data=a noprint;
tables id*attr_id*value / out=freqs(keep=id attr_id value count);
run;
/* sort so the most frequent value per id and attr_id ends up at the bottom of the group.
if there are ties then it's a matter of luck which value we get */
proc sort data = freqs;
by id attr_id count;
run;
/* read and recreate the original structure. */
data b(keep=id attr_1 - attr_3);
retain attr_1 - attr_3;
array attrs_{*} $ 1 attr_1 - attr_3;
set freqs;
by id attr_id;
if first.id then do;
do i=1 to dim(attrs_);
attrs_{i} = ' ';
end;
end;
if last.attr_id then do;
attrs_{attr_id} = value;
end;
if last.id then do;
output;
end;
run;

Combining columns in SAS

I just started using SAS and I'm trying to combine columns.
I've got table mainData
A1 A2 A3 A4
1 4 7 10
2 5 8 11
3 6 9 12
I want to create a new table rearrangedData
Type Value
A1 1
A1 2
A1 3
A2 4
A2 5
A2 6
A3 7
A3 8
A3 9
A4 10
A4 11
A4 12
There must be a simple solution to this I just can't figure this out. I'm thinking of writing do loop, but what if I don't know size of a table or amount of lines in a specific column. I can't figure how I would get such information in SAS.
This somewhat unusual transformation can be done via a transpose and some array logic:
data have;
input A1 A2 A3 A4;
cards;
1 4 7 10
2 5 8 11
3 6 9 12
;
run;
proc transpose data = have out = tr name=type prefix = r;
run;
data want;
set tr;
array r{*} r:;
do i = 1 to dim(r);
value = r[i];
output;
end;
drop i r:;
run;
Also, this preserves the original order without requiring a sort.
Make a dummy variable, then transpose data.
data have;
set have;
id=_n_;
run;
proc transpose data=have out=temp;
by id;
var A1-A4;
run;
proc sort data=temp out=want(rename=(_name_=type col1=value) drop=id);
by _name_;
run;
If you want to preserve the original order then you could use the POINT= option on the SET statement to loop over the data set once per variable (column).
So this data set will read the first observations just to get the variables defined. Then define the array VALUES so that we can use DIM(VALUES) to know how many columns. Then it uses the POINT= and NOBS= options on the SET statement to control the other loop. It uses the VNAME() function to find the name of the current variable in the array.
data want ;
set have ;
array values _numeric_;
do col=1 to dim(values);
length type $32 value 8;
type=vname(values(col));
do row=1 to nobs ;
set have point=row nobs=nobs ;
value=values(col);
output;
keep type value;
end;
end;
stop;
run;