I have a summary table which I want to transpose, but I can't get my head around. The columns should be the rows, and the columns are the values.
Some explanation about the table. Each column represents a year. People can be in 3 groups: A, B or C. In 2016, everyone (100) is in group A. In 2017, 35 are in group A (5 + 20 + 10), 15 in B and 50 in C.
DATA have;
INPUT year2016 $ year2017 $ year2018 $ count;
DATALINES;
A A A 5
A A B 20
A A C 10
A B C 15
A C A 50
;
RUN;
I want to be able to make a nice graph of the evolution of the groups through the different periods. So I want to end up with a table where the columns are the rows (=period) and the columns are the values (= the 3 different groups). Please find an example of the table I want:
Image of table want
I have tried different approaches, but I can't get what I want.
Maybe more direct way but this is probably how I would do it.
DATA have;
INPUT year2016 $ year2017 $ year2018 $ count;
id + 1;
DATALINES;
A A A 5
A A B 20
A A C 10
A B C 15
A C A 50
;
RUN;
proc print;
proc transpose data=have out=want1 name=period;
by id count notsorted;
var year:;
run;
proc print;
run;
proc summary data=want1 nway completetypes;
class period col1;
freq count;
output out=want2(drop=_type_);
run;
proc print;
run;
proc transpose data=want2 out=want(drop=_name_) prefix=Group_;
by period;
var _freq_;
id col1;
run;
proc print;
run;
Related
I have a dataset like as below, by using SAS, I need to assign the order variable based on descending count order to this dataset, when the category is missing, it should be always in the last whatever the count is. All other category above the missing one should be order by descending count.
Category Count
aa 10
bb 9
cc 8
6
ab 3
Desired output:
Category Count Order
aa 10 1
bb 9 2
cc 8 3
ab 3 4
6 5
You can use Proc DS2 to compute a sequence number for a result set.
Example:
data have;
input s $ f;
datalines;
aa 10
bb 9
cc 8
. 6
ab 3
;
proc ds2;
data want(overwrite=yes);
declare int sequence ;
method run();
set
{
select s,f from have
order by case when s is not null then f else -1e9-f end desc
};
sequence + 1;
end;
run;
quit;
Sort and split all the datasets into descending value order by missing and not-missing category, then stack them on top of each other.
/* Sort the non-missing values */
proc sort data=have out=have_notmissing;
by descending value;
where NOT missing(category);
run;
/* Sort the missing values */
proc sort data=have out=have_missing;
by descending value;
where missing(category);
run;
/* Stack them on top of each other */
data want;
set have_notmissing
have_missing
;
rank+1;
run;
Output:
category value rank
aa 10 1
bb 9 2
cc 8 3
ab 3 4
6 5
Perhaps what you really want is a NOTSORTED format and a procedure that supports the PRELOADFMT option and ORDER=DATA.
data test;
input category $ count;
cards;
aa 10
bb 9
cc 8
. 6
ab 3
;;;;
run;
proc print;
run;
proc format;
value $cat(notsorted)
'bb'='Bee Bee'
'aa'='Aha'
'cc'='CC'
'dd'='D D'
'ab'='AB'
;
quit;
proc summary data=test nway missing;
class category / order=data preloadfmt;
format category $cat.;
freq count;
output out=summary(drop=_type_) / levels;
run;
proc print;
run;
Follow up to
SAS - transpose multiple variables in rows to columns
I have the following code:
data have;
input CX_ID 1. TYPE $1. COUNT_RATE 1. SUM_RATE 2.;
datalines;
1A110
1B220
2A120
;
run;
proc summary data = have nway;
class cx_id;
output out=want (drop = _:)
idgroup(out[2] (count_rate sum_rate)= count sum);
run;
So this table:
CX_ID TYPE COUNT_RATE SUM_RATE
1 A 1 10
1 B 2 20
2 A 1 20
becomes
CX_ID COUNT_1 COUNT_2 SUM_1 SUM_2
1 1 2 10 20
2 1 . 20 .
Which is perfect, but how do I set the names to be
Count_A Count_B Sum_A Sum_B
Or in general whatever the value in the type field of the have table ?
Thank you
A double PROC TRANSPOSE is dynamic and you can add a data step to customize the names easily.
*sample data;
data have;
input CX_ID 1. TYPE $1. COUNT 1. SUM 2.;
datalines;
1A110
1B220
2A120
;
run;
*transpose to long;
proc transpose data=have out=long;
by cx_id type;
run;
*transpose to wide;
proc transpose data=long out=wide;
by cx_id;
var col1;
id _name_ type;
run;
I need a summation column, however, both retain and lag commando'es are inefficient.
There are number of ways. You could use proc sql or proc means. I've written a way below:
data begin;
length person $3 sallary 5;
input person sallary;
datalines;
a 200
a 300
b 800
c 400
c 500
c 600
;
run;
proc means data=begin noprint;
by person; /*Handle each person as distinct subset*/
output out=Sal_by_person(drop= _type_ _freq_)
sum(sallary)=Total_sallary /*What we calculate and what we call them.*/
;
run;
I have a dataset that has columns like:
a|b|c|d|e
and rows like:
1|3|5|7|9
2|4|6|8|10
How to change it to:
Char|Num|
a|1
a|2
b|3
b|4
c|5
c|6
d|7
d|8
e|9
e|10
Thank you in advance!
You can use PROC TRANSPOSE. The only gotcha is to get what you want you need a BY variable. Easiest thing is to add a record number and use that as your BY.
data have;
input a b c d;
i = _n_;
datalines;
1 2 3 4
5 6 7 8
;
run;
proc transpose data=have out=want(drop=i);
by i;
var a b c d;
run;
County...AgeGrp...Population
A.............1..........200
A.............2..........100
A.............3..........100
A............All.........400
B.............1..........200
So, I have a list of counties and I'd like to find the under 18 population as a percent of the population for each county, so as an example from the table above I'd like to add only the population of agegrp 1 and 2 and divide by the 'all' population. In this case it would be 300/400. I'm wondering if this can be done for every county.
Let's call your SAS data set "HAVE" and say it has two character variables (County and AgeGrp) and one numeric variable (Population). And let's say you always have one observation in your data set for a each County with AgeGrp='All' on which the value of Population is the total for the county.
To be safe, let's sort the data set by County and process it in another data step to, creating a new data set named "WANT" with new variables for the county population (TOT_POP), the sum of the two Age Group values you want (TOT_GRP) and calculate the proportion (AgeGrpPct):
proc sort data=HAVE;
by County;
run;
data WANT;
retain TOT_POP TOT_GRP 0;
set HAVE;
by County;
if first.County then do;
TOT_POP = 0;
TOT_GRP = 0;
end;
if AgeGrp in ('1','2') then TOT_GRP + Population;
else if AgeGrp = 'All' then TOT_POP = Population;
if last.County;
AgeGrpPct = TOT_GRP / TOT_POP;
keep County TOT_POP TOT_GRP AgeGrpPct;
output;
run;
Notice that the observation containing AgeGrp='All' is not really needed; you could just as well have created another variable to collect a running total for all age groups.
If you want a procedural approach, create a format for the under 18's, then use PROC FREQ to calculate the percentage. It is necessary to exclude the 'All' values from the dataset with this method (it's generally bad practice to include summary rows in the source data).
PROC TABULATE could also be used for this.
data have;
input County $ AgeGrp $ Population;
datalines;
A 1 200
A 2 100
A 3 100
A All 400
B 1 200
B 2 300
B 3 500
B All 1000
;
run;
proc format;
value $age_fmt '1','2' = '<18'
other = '18+';
run;
proc sort data=have;
by county;
run;
proc freq data=have (where=(agegrp ne 'All')) noprint;
by county;
table agegrp / out=want (drop=COUNT where=(agegrp in ('1','2')));
format agegrp $age_fmt.;
weight population;
run;