Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a dataset with two columns having gender and birth_date, I need the output in such a form that there are two rows i.e. male and female (gender) and columns as months of a year. How do I do that in SAS?
Suppose:
Gender Birthdate
Male 01/10/1989
Female 02/12/1990
and so on..i have around 100K rows.
So I assume you want the number of people born in each month.
The first data set creates some test data.
data test;
format Gender $8. Birthdate date9.;
do i=1 to 5000;
gender="MALE";
Birthdate = today() + ceil(ranuni(123)*365);
output;
end;
do i=1 to 5000;
gender="FEMALE";
Birthdate = today() + ceil(ranuni(123)*365);
output;
end;
drop i;
run;
proc sort data=test;
by gender;
run;
data output(keep=gender month:);
set test;
by gender;
array Month_[12];
if first.gender then do;
do i=1 to 12;
month_[i] = 0;
end;
end;
month_[month(birthdate)] + 1;
if last.gender then
output;
run;
Creates
If I understood what you are asking and if the result set below looks like the output you wanted.. here is the code I used naming the initial data set as 'YYY'
data new(drop=year1 MonthName birthdate) ;
set yyy;
format new_col $14. MonthName year1 $12.;
year1 = compress(year(birthdate));
MonthName=put(birthdate,monname.);
new_col = trim(MonthName)||trim(year1);
run;
proc sort data=new;
by gender new_col;
run;
proc sql;
create table new2 as
select gender, New_col, count(new_col) as New_col2
from new
group by gender, New_col;
quit;
proc transpose data=new2
out=result
name=New_col;
by gender;
id New_col;
run;
Data final;
set result(drop=New_col);
run;
Result set:
Related
I have a dataset in SAS in which the months would be dynamically updated each month. I need to calculate the sum vertically each month and paste the sum below, as shown in the image.
Proc means/ proc summary and proc print are not doing the trick for me.
I was given the following code before:
`%let month = month name;
%put &month.;
data new_totals;
set Final_&month. end=end;
&month._sum + &month._final;
/*feb_sum + &month._final;*/
output;
if end then do;
measure = 'Total';
&month._final = &month._sum;
/*Feb_final = feb_sum;*/
output;
end;
drop &month._sum;
run; `
The problem is this has all the months hardcoded, which i don't want. I am not too familiar with loops or arrays, so need a solution for this, please.
enter image description here
It may be better to use a reporting procedure such as PRINT or REPORT to produce the desired output.
data have;
length group $20;
do group = 'A', 'B', 'C';
array month_totals jan2020 jan2019 feb2020 feb2019 mar2019 apr2019 may2019 jun2019 jul2019 aug2019 sep2019 oct2019 oct2019 nov2019 dec2019;
do over month_totals;
month_totals = 10 + floor(rand('uniform', 60));
end;
output;
end;
run;
ods excel file='data_with_total_row.xlsx';
proc print noobs data=have;
var group ;
sum jan2020--dec2019;
run;
proc report data=have;
columns group jan2020--dec2019;
define group / width=20;
rbreak after / summarize;
compute after;
group = 'Total';
endcomp;
run;
ods excel close;
Data structure
The data sets you are working with are 'difficult' because the date aspect of the data is actually in the metadata, i.e. the column name. An even better approach, in SAS, is too have a categorical data with columns
group (categorical role)
month (categorical role)
total (continuous role)
Such data can be easily filtered with a where clause, and reporting procedures such as REPORT and TABULATE can use the month variable in a class statement.
Example:
data have;
length group $20;
do group = 'A', 'B', 'C';
do _n_ = 0 by 1 until (month >= '01feb2020'd);
month = intnx('month', '01jan2018'd, _n_);
total = 10 + floor(rand('uniform', 60));
output;
end;
end;
format month monyy5.;
run;
proc tabulate data=have;
class group month;
var total;
table
group all='Total'
,
month='' * total='' * sum=''*f=comma9.
;
where intck('month', month, '01feb2020'd) between 0 and 13;
run;
proc report data=have;
column group (month,total);
define group / group;
define month / '' across order=data ;
define total / '' ;
where intck('month', month, '01feb2020'd) between 0 and 13;
run;
Here is a basic way. Borrowed sample data from Richard.
data have;
length group $20;
do group = 'A', 'B';
array months jan2020 jan2019 feb2020 feb2019 mar2019 apr2019 may2019 jun2019 jul2019 aug2019 sep2019 oct2019 oct2019 nov2019 dec2019;
do over months;
months = 10 + floor(rand('uniform', 60, 1));
end;
output;
end;
run;
proc summary data=have;
var _numeric_;
output out=temp(drop=_:) sum=;
run;
data want;
set have temp (in=t);
if t then group='Total';
run;
I have been trying to create a demographic table like below this but I can't seem append the different tables. Please advise on where I can make adjustments in the code.
Group A Group B
chort 1 cohort 2 cohort 3 subtotal cohort 4 cohort 5 cohort 6 subtotal
Age
n
mean
sd
median
min
Gender
n
female
male
Race
n
white
asian
hispanic
black
My Code:
PROC FORMAT;
value content
1=' '
2='Age'
3='Gender'
4='Race'
value sex
1=' n'
2=' female'
3=' male';
value race
1=' n'
2=' white'
3=' asian'
4=' hispanic'
5=' black';
value stat
1=' n'
2=' Mean'
3=' Std. Dev.'
4=' Median'
5=' Minimum';
RUN;
DATA testtest;
SET test.test(keep = id group cohort age gender race);
RUN;
data tottest;
set testtest;
output;
if prxmatch('m/COHORT 1|COHORT 2|COHORT 3/oi', cohort) then do;
cohort='Subtotal';
output;
end;
if prxmatch('m/COHORT 4|COHORT 5|COHORT 6/oi', cohort) then do;
cohort='Subtotal';
output;
end;
run;
data count;
if 0 then set testtest nobs=npats;
call symput('npats',put(npats,1.));
stop;
run;
proc freq data=tottest;
tables cohort /out=patk0 noprint;
tables cohort*sex /out=sex0 noprint;
tables cohort*race /out=race0 noprint;
run;
PROC MEANS DATA = testtest n mean std min median;
class cohort;
VAR age;
RUN;
I know that I would have to transpose it and out it in a report. But before I do that, how do I get the variable out of my proc means, proc freq, etc?
Suppose I have these data read into SAS:
I would like to list each unique name and the number of months it appeared in the data above to give a data set like this:
I have looked into PROC FREQ, but I think I need to do this in a DATA step, because I would like to be able to create other variables within the new data set and otherwise be able to manipulate the new data.
Data step:
proc sort data=have;
by name month;
run;
data want;
set have;
by name month;
m=month(lag(month));
if first.id then months=1;
else if month(date)^=m then months+1;
if last.id then output;
keep name months;
run;
Pro Sql:
proc sql;
select distinct name,count(distinct(month(month))) as months from have group by name;
quit;
While it's possible to do this in a data step, you wouldn't; you'd use proc freq or similar. Almost every PROC can give you an output dataset (rather than just print to the screen).
PROC FREQ data=sashelp.class;
tables age/out=age_counts noprint;
run;
Then you can use this output dataset (age_counts) as a SET input to another data step to perform your further calculations.
You can also use proc sql to group the variable and count how many are in that group. It might be faster than proc freq depending on how large your data is.
proc sql noprint;
create table counts as
select AGE, count(*) as AGE_CT from sashelp.class
group by AGE;
quit;
If you want to do it in a data step, you can use a Hash Object to hold the counted values:
data have;
do i=1 to 100;
do V = 'a', 'b', 'c';
output;
end;
end;
run;
data _null_;
set have end=last;
if _n_ = 1 then do;
declare hash cnt();
rc = cnt.definekey('v');
rc = cnt.definedata('v','v_cnt');
rc = cnt.definedone();
call missing(v_cnt);
end;
rc = cnt.find();
if rc then do;
v_cnt = 1;
cnt.add();
end;
else do;
v_cnt = v_cnt + 1;
cnt.replace();
end;
if last then
rc = cnt.output(dataset: "want");
run;
This is very efficient as it is a single loop over the data. The WANT data set contains the key and count values.
I have a proc report that groups and does subtotals. If I only have one observation in the group, the subtotal is useless. I'd like to either not do the subtotal for that line or not do the observation there. I don't want to go with a line statement, due to inconsistent formatting\style.
Here's some sample data. In the report the Tiki (my cat) line should only have one line, either the obs from the data or the subtotal...
data tiki1;
name='Tiki';
sex='C';
age=10;
height=6;
weight=9.5;
run;
data test;
set sashelp.class tiki1;
run;
It looks like you are trying do something that proc report cannot achieve in one pass. If however you just want the output you describe here is an approach that does not use proc report.
proc sort data = test;
by sex;
run;
data want;
length sex $10.;
set test end = eof;
by sex;
_tot + weight;
if first.sex then _stot = 0;
_stot + weight;
output;
if last.sex and not first.sex then do;
Name = "";
sex = "Subtotal " || trim(sex);
weight = _stot;
output;
end;
keep sex name weight;
if eof then do;
Name = "";
sex = "Total";
weight = _tot;
output;
end;
run;
proc print data = want noobs;
run;
This method manually creates subtotals and a total in the dataset by taking rolling sums. If you wanted do fancy formatting you could pass this data through proc report rather than proc print, Joe gives an example here.
I have to calculate the correlation and covariance for my daily sales values for an event window. The event window is of 45 day period and my data looks like -
store_id date sales
5927 12-Jan-07 3,714.00
5927 12-Jan-07 3,259.00
5927 14-Jan-07 3,787.00
5927 14-Jan-07 3,480.00
5927 17-Jan-07 3,646.00
5927 17-Jan-07 3,316.00
4978 18-Jan-07 3,530.00
4978 18-Jan-07 3,103.00
4978 18-Jan-07 3,026.00
4978 21-Jan-07 3,448.00
Now, for every store_id, date combination, I need to go back 45 days (there is more data for each combination in my original data set) calculate the correlation between sales and lag(sales) i.e. autocorrelation of degree one. As you can see, the date column is not continuous. So something like (date - 45) is not going to work.
I have gotten till this part -
data ds1;
set ds;
by store_id;
LAG_SALE = lag(sales);
IF FIRST.store_idTHEN DO;
LAG_SALE = .;
END;
run;
For calculating correlation and covariances -
proc corr data=ds1 outp=Corr
by store_id date;
cov; /** include covariances **/
var sales lag_sale;
run;
But how do I insert the event window for each date, store_id combination? My final output should look something like this -
id date corr cov
5927 12-Jan-07 ... ...
5927 14-Jan-07 ... ...
Here is what I've come up with:
First I convert the date to a SAS date, which is the number of days since Jan. 1 1960:
data ds;
set ds (rename=(date=old_date));
date = input(old_date, date11.);
drop old_date;
run;
Then compute lag_sale (I am using the same calculation you used in the question, but make sure this is what you want to do. For some observations the lag sale is the previous recorded date, but for some it is the same store_id and date, just a different observation.):
proc sort data=ds; by store_id; run;
data ds;
set ds;
by store_id;
lag_sale = lag(sales);
if first.store_id then lag_sale = .;
run;
Then set up the final data set:
data final;
length store_id 8 date 8 cov 8 corr 8;
if _n_ = 0;
run;
Then create a macro which takes a store_id and date and runs proc corr. The first part of the macro selects only the data with that store_id and within the past 45 days of the date. Then it runs proc corr. Then it formats proc corr how you want it and appends the results to the "final" data set.
%macro corr(store_id, date);
data ds2;
set ds;
where store_id = &store_id and %eval(&date-45) <= date <=&date
and lag_sale ne .;
run;
proc corr noprint data=ds2 cov outp=corr;
by store_id;
var sales lag_sale;
run;
data corr2;
set corr;
where _type_ in ('CORR', 'COV') and _name_ = 'sales';
retain cov;
date = &date;
if _type_ = 'COV' then cov = lag_sale;
else do;
corr = lag_sale;
output;
end;
keep store_id date corr cov;
run;
proc append base=final data=corr2 force; run;
%mend corr;
Finally run the macro for each store_id/date combination.
proc sort data=ds out=ds3 nodupkey;
by store_id date;
run;
data _null_;
set ds3;
call execute('%corr('||store_id||','||date||');');
run;
proc sort data=final;
by store_id date;
run;