Setting names to idgroup - sas

Follow up to
SAS - transpose multiple variables in rows to columns
I have the following code:
data have;
input CX_ID 1. TYPE $1. COUNT_RATE 1. SUM_RATE 2.;
datalines;
1A110
1B220
2A120
;
run;
proc summary data = have nway;
class cx_id;
output out=want (drop = _:)
idgroup(out[2] (count_rate sum_rate)= count sum);
run;
So this table:
CX_ID TYPE COUNT_RATE SUM_RATE
1 A 1 10
1 B 2 20
2 A 1 20
becomes
CX_ID COUNT_1 COUNT_2 SUM_1 SUM_2
1 1 2 10 20
2 1 . 20 .
Which is perfect, but how do I set the names to be
Count_A Count_B Sum_A Sum_B
Or in general whatever the value in the type field of the have table ?
Thank you

A double PROC TRANSPOSE is dynamic and you can add a data step to customize the names easily.
*sample data;
data have;
input CX_ID 1. TYPE $1. COUNT 1. SUM 2.;
datalines;
1A110
1B220
2A120
;
run;
*transpose to long;
proc transpose data=have out=long;
by cx_id type;
run;
*transpose to wide;
proc transpose data=long out=wide;
by cx_id;
var col1;
id _name_ type;
run;

Related

SAS Array Variable Name Based on Another Array

I have data in the following format:
data have;
input id rtl_apples rtl_oranges rtl_berries;
datalines;
1 50 60 10
2 10 30 80
3 40 8 1
;
I'm trying to create new variables that represent the percent of the sum of the RTL variables, PCT_APPLES, PCT_ORANGES, PCT_BERRIES. The problem is I'm doing this within a macro so the names and number of RTL variables with vary with each iteration so the new variable names need to be generated dynamically.
This data step essentially gets what I need, but the new variables are in the format PCT1, PCT2, PCTn format so it's difficult to know which RTL variable the PCT corresponds too.
data want;
set have;
array rtls[*] rtl_:;
total_sales = sum(of rtl_:);
call symput("dim",dim(rtls));
array pct[&dim.];
do i=1 to dim(rtls);
pct[i] = rtls[i] / total_sales;
end;
drop i;
run;
I also tried creating the new variable name by using a macro variable, but only the last variable in the array is created. In this case, PCT_BERRIES.
data want;
set have;
array rtls[*] rtl_:;
total_sales = sum(of rtl_:);
do i=1 to dim(rtls);
var_name = compress(tranwrd(upcase(vname(rtls[i])),'RTL','PCT'));
call symput("var_name",var_name);
&var_name. = rtls[i] / total_sales;
end;
drop i var_name;
run;
I have a feeling I'm over complicating this so any help would be appreciated.
If you have the list of names in data already then use the list to create the names you need for your arrays.
proc sql noprint;
select distinct cats('RTL_',name),cats('PCT_',name)
into :rtl_list separated by ' '
, :pct_list separated by ' '
from dataset_with_names
;
quit;
data want;
set have;
array rtls &rtl_list;
array pcts &pct_list;
total_sales = sum(of rtls[*]);
do index=1 to dim(rtls);
pcts[index] = rtls[index] / total_sales;
end;
drop index ;
run;
You can't create variables while a data step is executing. This program uses PROC TRANSPOSE to create a new data using the RTL_ variables "renamed" PCT_.
data have;
input id rtl_apples rtl_oranges rtl_berries;
datalines;
1 50 60 10
2 10 30 80
3 40 8 1
;;;;
run;
proc transpose data=have(obs=0) out=names;
var rtl_:;
run;
data pct;
set names;
_name_ = transtrn(_name_,'rtl_','PCT_');
y = .;
run;
proc transpose data=pct out=pct2;
id _name_;
var y;
run;
data want;
set have;
if 0 then set pct2(drop=_name_);
array _rtl[*] rtl_:;
array _pct[*] pct_:;
call missing(of _pct[*]);
total = sum(of _rtl[*]);
do i = 1 to dim(_rtl);
_pct[i] = _rtl[i]/total*1e2;
end;
drop i;
run;
proc print;
run;
You may want to just report the row percents
proc transpose data=&data out=&data.T;
by id;
var rtl_:;
run;
proc tabulate data=&data.T;
class id _name_;
var col1;
table
id=''
, _name_='Result'*col1=''*sum=''
_name_='Percent'*col1=''*rowpctsum=''
/ nocellmerge;
run;

Looping around a whole data step in SAS

So I have a following code:
%let macroVar = Var1 Var2;
Data new1;
Set old1 (keep= count &macroVar.);
Run;
Proc means data = new1 nway missing noprint;
Class var1;
Var count;
Output out= out_var1 sum=;
Proc means data = new1 nway missing noprint;
Class var2;
Var count;
Output out= out_var2 sum=;
How can I write out the two proc means in one data step, using the macro variable I set up at the beginning?
Many thanks
Why not just do it in one PROC MEANS call?
%let varlist=sex age ;
proc means data=sashelp.class missing noprint;
class &varlist;
ways 1;
var height;
output out=want sum=;
run;
Result:
Obs Sex Age _TYPE_ _FREQ_ Height
1 11 1 2 108.8
2 12 1 5 297.2
3 13 1 3 184.3
4 14 1 4 259.6
5 15 1 4 262.5
6 16 1 1 72.0
7 F . 2 9 545.3
8 M . 2 10 639.1
If you really must have separate output datasets then it will be faster to generate them from the output above.
%macro sum_counts(varlist,data=,var=count);
%local i n;
%let n=%sysfunc(countw(&varlist));
proc means data=&data missing noprint ;
class &varlist;
ways 1;
var &var;
output out=_summary_ sum=;
run;
%do i=1 %to &n;
data new&i;
set _summary_;
where _type_=2**(&n-&i);
keep %scan(&varlist,&i) &var;
run;
%end;
%mend sum_counts;
Example:
263 options mprint;
264 %sum_counts(varlist=sex age,data=sashelp.class,var=height);
MPRINT(SUM_COUNTS): proc means data=sashelp.class missing noprint ;
MPRINT(SUM_COUNTS): class sex age;
MPRINT(SUM_COUNTS): ways 1;
MPRINT(SUM_COUNTS): var height;
MPRINT(SUM_COUNTS): output out=_summary_ sum=;
MPRINT(SUM_COUNTS): run;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK._SUMMARY_ has 8 observations and 5 variables.
MPRINT(SUM_COUNTS): data new1;
MPRINT(SUM_COUNTS): set _summary_;
MPRINT(SUM_COUNTS): where _type_=2**(2-1);
MPRINT(SUM_COUNTS): keep sex height;
MPRINT(SUM_COUNTS): run;
NOTE: There were 2 observations read from the data set WORK._SUMMARY_.
WHERE _type_=2;
NOTE: The data set WORK.NEW1 has 2 observations and 2 variables.
MPRINT(SUM_COUNTS): data new2;
MPRINT(SUM_COUNTS): set _summary_;
MPRINT(SUM_COUNTS): where _type_=2**(2-2);
MPRINT(SUM_COUNTS): keep age height;
MPRINT(SUM_COUNTS): run;
NOTE: There were 6 observations read from the data set WORK._SUMMARY_.
WHERE _type_=1;
NOTE: The data set WORK.NEW2 has 6 observations and 2 variables.

SAS transpose columns to row and values to columns

I have a summary table which I want to transpose, but I can't get my head around. The columns should be the rows, and the columns are the values.
Some explanation about the table. Each column represents a year. People can be in 3 groups: A, B or C. In 2016, everyone (100) is in group A. In 2017, 35 are in group A (5 + 20 + 10), 15 in B and 50 in C.
DATA have;
INPUT year2016 $ year2017 $ year2018 $ count;
DATALINES;
A A A 5
A A B 20
A A C 10
A B C 15
A C A 50
;
RUN;
I want to be able to make a nice graph of the evolution of the groups through the different periods. So I want to end up with a table where the columns are the rows (=period) and the columns are the values (= the 3 different groups). Please find an example of the table I want:
Image of table want
I have tried different approaches, but I can't get what I want.
Maybe more direct way but this is probably how I would do it.
DATA have;
INPUT year2016 $ year2017 $ year2018 $ count;
id + 1;
DATALINES;
A A A 5
A A B 20
A A C 10
A B C 15
A C A 50
;
RUN;
proc print;
proc transpose data=have out=want1 name=period;
by id count notsorted;
var year:;
run;
proc print;
run;
proc summary data=want1 nway completetypes;
class period col1;
freq count;
output out=want2(drop=_type_);
run;
proc print;
run;
proc transpose data=want2 out=want(drop=_name_) prefix=Group_;
by period;
var _freq_;
id col1;
run;
proc print;
run;

Isolate Patients with 2 diagnoses but diagnosis data is on different lines

I have a dataset of patient data with each diagnosis on a different line.
This is an example of what it looks like:
patientID diabetes cancer age gender
1 1 0 65 M
1 0 1 65 M
2 1 1 23 M
2 0 0 23 M
3 0 0 50 F
3 0 0 50 F
I need to isolate the patients who have a diagnosis of both diabetes and cancer; their unique patient identifier is patientID. Sometimes they are both on the same line, sometimes they aren't. I am not sure how to do this because the information is on multiple lines.
How would I go about doing this?
This is what I have so far:
PROC SQL;
create table want as
select patientID
, max(diabetes) as diabetes
, max(cancer) as cancer
, min(DOB) as DOB
from diab_dx
group by patientID;
quit;
data final; set want;
if diabetes GE 1 AND cancer GE 1 THEN both = 1;
else both =0;
run;
proc freq data=final;
tables both;
run;
Is this correct?
If you want to learn about data steps lookup how this works.
data pat;
input patientID diabetes cancer age gender:$1.;
cards;
1 1 0 65 M
1 0 1 65 M
2 1 1 23 M
2 0 0 23 M
3 0 0 50 F
3 0 0 50 F
;;;;
run;
data both;
do until(last.patientid);
set pat; by patientid;
_diabetes = max(diabetes,_diabetes);
_cancer = max(cancer,_cancer);
end;
both = _diabetes and _cancer;
run;
proc print;
run;
add a having statement at the end of sql query should do.
PROC SQL;
create table want as
select patientID
, max(diabetes) as diabetes
, max(cancer) as cancer
, min(age) as DOB
from PAT
group by patientID
having calculated diabetes ge 1 and calculated cancer ge 1;
quit;
You might find some coders, especially those coming from statistical backgrounds, are more likely to use Proc MEANS instead of SQL or DATA step to compute the diagnostic flag maximums.
proc means noprint data=have;
by patientID;
output out=want
max(diabetes) = diabetes
max(cancer) = cancer
min(age) = age
;
run;
or for the case of all the same aggregation function
proc means noprint data=have;
by patientID;
var diabetes cancer;
output out=want max= ;
run;
or
proc means noprint data=have;
by patientID;
var diabetes cancer age;
output out=want max= / autoname;
run;

SAS, sum by row AND column

I want to do some sum calculate for a data set. The challenge is I need to do both row sum AND column Sum by ID. Below is the example.
data have;
input ID var1 var2;
datalines;
1 1 1
1 3 2
1 2 3
2 0 5
2 1 3
3 0 1
;
run;
data want;
input ID var1 var2 sum;
datalines;
1 1 1 12
1 3 2 12
1 2 3 12
2 0 5 9
2 1 3 9
3 0 1 1
;
run;
Using SQL is cool, but SAS has nice data step!
proc sort data=have; by id; run;
data result;
set have;
by id;
retain sum 0;
if first.id then sum=0;
sum=sum+sum(var1,var2);
if last.id then output;
run;
proc sort data=result; by id; run;
data want;
merge have result;
by id;
run;
You will decide what to use...
Use SQL to do all of it in one step. Group only by ID, but keep var1 and var2 in the column selection. This will create the same data in want.
proc sql noprint;
create table want as
select ID
, var1
, var2
, sum(var1) + sum(var2) as sum
from have
group by ID
;
quit;