Consider the following data set test :
Drug Quantity State Year
A
B
C
. . . .
How would I sum up the quantities of each drug grouped by state and year? Would it be something like:
data test;
by Drug State Year;
Total = sum(Quantity)
run;
You need something like this:
data test;
input Drug $ Quantity State $ Year;
datalines;
A 10 NY 2013
A 20 NY 2014
B 110 NY 2013
B 210 NY 2014
A 50 OH 2013
A 60 OH 2014
B 150 OH 2013
B 260 OH 2014
A 22 NY 2014
B 100 OH 2013
;
RUN;
proc means data= test SUM MAXDEC=0;
class Drug State Year;
var Quantity;
RUN;
Mucio answer is good, but if you are after SAS SQL version, here it is:
data test;
input Drug $ Quantity State $ Year;
datalines;
A 10 NY 2013
A 20 NY 2014
B 110 NY 2013
B 210 NY 2014
A 50 OH 2013
A 60 OH 2014
B 150 OH 2013
B 260 OH 2014
A 22 NY 2014
B 100 OH 2013
;
RUN;
PROC SQL;
CREATE TABLE EGTASK.QUERY_FOR_TEST AS
SELECT t1.Drug,
t1.State,
t1.Year,
/* SUM_of_Quantity */
(SUM(t1.Quantity)) AS SUM_of_Quantity
FROM WORK.TEST t1
GROUP BY t1.Drug,
t1.State,
t1.Year;
QUIT;
Result:
Related
I have a dataset that looks like:
qy balance
2010 Q1 10
2010 Q1 10
2010 Q1 10
2010 Q2 20
2010 Q2 20
2010 Q3 20
2010 Q4 50
2011 Q1 100
2011 Q2 200
2011 Q3 300
and I would like to create a new variable which contains the sum per quarter. Desired output:
qy balance sum_balance
2010 Q1 10 30
2010 Q1 10 30
2010 Q1 10 30
2010 Q2 20 60
2010 Q2 20 60
2010 Q3 20 60
2010 Q4 50 50
2011 Q1 100 100
2011 Q2 200 200
2011 Q3 300 300
How can I do that?
Using proc sql and sum function:
proc sql;
create table want as
select qy,
balance,
sum(balance) as sum_balance
from have
group by qy;
quit;
Here is an alternative data step approach
data have;
input qy $ 1-7 balance;
datalines;
2010 Q1 10
2010 Q1 10
2010 Q1 10
2010 Q2 20
2010 Q2 20
2010 Q3 20
2010 Q4 50
2011 Q1 100
2011 Q2 200
2011 Q3 300
;
data want;
do until (last.qy);
set have;
by qy;
sum_balance + balance;
end;
do until (last.qy);
set have;
by qy;
output;
end;
sum_balance=0;
run;
data work.want2;
input Y M $ ID $ volume;
datalines;
2009 JAN A1 100
2009 FEB A1 20
2009 FEB A1 80
2009 JAN A2 100
2009 JAN A2 100
2009 FEB A2 20
2009 FEB A2 80
2009 JAN A3 100
2009 FEB A3 150
2009 MAR A3 100
2011 DEC A1 100
2011 DEC A1 20
2011 DEC A2 20
2011 DEC A3 120
2011 DEC A3 80
2011 OCT A1 100
2011 OCT A2 20
2011 OCT A2 100
;
proc print data=want2;
run;
/*Code 2--> to sum by Y M ID*/
PROC SQL;
create table want3 as SELECT
Y,
M,
ID,
sum(volume) AS sumvolume
FROM want2
GROUP BY Y, M ,ID;
QUIT;
/*Code 3 -->get sum by Y M*/
PROC SQL;
SELECT
Y,
M,
sum(sumvolume) AS sumvolume_MO
FROM want3
GROUP BY Y, M;
QUIT;
I have use SAS SQL(code 2) to sum by ID, Y and M. I want to add a new variable,Monthly volume, dependent on Y and M.I have use "code 3" to get the results.
Is it possible to combine code 2 and code 3 together to get the results as following? I always get errors.
Thanks in advance.
Y M ID sumvolume sumvolume_MO
2009 FEB A1 100 350
2009 FEB A2 100 350
2009 FEB A3 150 350
2009 JAN A1 100 400
2009 JAN A2 200 400
2009 JAN A3 100 400
2009 MAR A3 100 100
2011 DEC A1 120 340
2011 DEC A2 20 340
2011 DEC A3 200 340
2011 OCT A1 100 220
2011 OCT A2 120 220
Updated to reflect results wanted sum(volume) instead of raw volume.
In general you would want to use sub queries. You could calculate the sum over the different groupings in separate subqueries and merge the results back together.
select a.y,a.m,a.id,a.sumvolume,b.sumvolume_mo
from
(select y,m,id,sum(volume) as sumvolume
from have
group by 1,2,3
) a
natural join
(select y,m,sum(volume) as sumvolume_mo
from have
group by 1,2
) b
;
But PROC SQL in SAS will also let you include non group and non aggregate variables in the SELECT and automatically remerge the data for you. So your could get SUMVOLUME_MO by adding up the values of SUMVOLUME.
select y,m,id,sumvolume,sum(sumvolume) as sumvolume_mo
from
(select y,m,id,sum(volume) as sumvolume
from have
group by 1,2,3
)
group by 1,2
;
Thanks to TOM's answers. I can get the results from the following codes.
PROC SQL;
create table newwant2 as
select y,m,id, sum(volume) as sumvolume_mo2,sumvolume_mo
from newwant
group by Y,M,id
;
Then I use the following code to delete the duplicate rows and keep the last row of each duplicate.
data newwant3;
set newwant2;
by Y M ID sumvolume_mo2 ;
if last.ID;
run;
proc print data=newwant3;
run;
I don't know where to start with this. I've tried listing the columns in every possible order but they are always listed horizontally. The dataset is:
data job2;
input year apply_count interviewed_count hired_count interviewed_mean hired_mean;
datalines;
2012 349 52 12 0.149 0.23077
2013 338 69 20 0.20414 0.28986
2014 354 70 18 0.19774 0.25714
;
run;
Here's an example of the proc report code for just one analysis variable:
proc report data = job2;
columns apply_count year;
define year / across " ";
define apply_count / analysis "Applied" format = comma8.;
run;
Ideally the final report would look like this:
2012 2013 2014
Applied 349 338 354
Interv. 52 69 70
Hired 12 20 18
Inter % 15% 20% 20%
Hired % 23% 29% 26%
I don't know if this is the best way to do this.
data job2;
input year apply_count interviewed_count hired_count interviewed_mean hired_mean;
datalines;
2012 349 52 12 0.149 0.23077
2013 338 69 20 0.20414 0.28986
2014 354 70 18 0.19774 0.25714
;;;;
run;
proc transpose data=job2 out=job3;
by year;
run;
data job3;
set job3;
length y atype $8;
y = propcase(scan(_name_,1,'_'));
atype = scan(_name_,-1,'_');
if atype eq 'mean' then substr(y,8,1)='%';
run;
proc print;
run;
proc report data=job3 list;
columns atype y year, col1 dummy;
define atype / group noprint;
define y / group order=data ' ';
define year / across ' ';
define dummy / noprint;
define col1 / format=12. ' ';
compute before atype;
xatype = atype;
endcomp;
compute after atype;
line ' ';
endcomp;
compute col1;
if xatype eq 'mean' then do;
call define('_C3_','format','percent12.');
call define('_C4_','format','percent12.');
call define('_C5_','format','percent12.');
end;
endcomp;
run;
I have data that is set up like this:
Pers Year Month Variable Value
AAA 2001 01 Var1 100
AAA 2001 01 Var2 200
AAA 2001 06 Var1 110
AAA 2001 06 Var2 210
AAA 2002 01 Var1 120
AAA 2002 01 Var2 .
BBB 2001 01 Var1 100
BBB 2001 01 Var2 200
BBB 2001 06 Var1 110
BBB 2001 06 Var2 210
BBB 2002 01 Var2 220
I would like data that looks like this:
Pers Year Month Var1 Var2
AAA 2001 01 100 200
AAA 2001 06 110 210
AAA 2002 01 120 .
BBB 2001 01 100 200
BBB 2001 06 110 210
BBB 2002 01 . 220
How can I do this in SAS, preferably with proc transpose or sql?
Note that in the input data, above, Person BBB is missing an observation for 2002-01 Var1, but the output data has returned a missing value in the last line, i.e. ".".
Using proc transpose is the obvious solution.
proc transpose data=yourdata out=yourdatat1(drop=_name_);
by pers year month;
id variable;
var value;
run;
Using proc sql, you can use case when logic to summarize the data like below:
proc sql;
create table yourdatat2 as
select
pers,
year,
month,
sum(case when variable = 'Var1' then value else . end) as Var1,
sum(case when variable = 'Var2' then value else . end) as Var2
from
yourdata
group by
pers,
year,
month
;
quit;
Tried various formats of date, but output do not reflects any date. What could be the issue?
data c;
input age gender income color$ doj$;
format doj date9.;
datalines;
19 1 14000 W 14/07/1988
45 2 45000 b 15/09/1956
34 2 56000 y 14/09/1967
33 1 45000 b 14/02/1956
;
run;
You are mixing things up a bit.
The date formats are to be applied on numeric data, not on text data.
So you should not read in doj as $ (text), but as a date (so a date informat).
Try DDMMYY10. for doj on your input statement:
data c;
input age gender income color$ doj ddmmyy10.;
format doj date9.;
datalines;
19 1 14000 W 14/07/1988
45 2 45000 b 15/09/1956
34 2 56000 y 14/09/1967
33 1 45000 b 14/02/1956
;
run;