I have a dataset with a variable that has the date of orders (MMDDYY10.). I need to extract the month and year to look at orders by month--year because there are multiple years. I am vaguely aware of the month and year functions, but how can I use them together to create a new month bin?
Ideally I would create a bin for each month/year so my output can look something like:
Date
Item 2011OCT 2011NOV 2011DEC 2012JAN ...
a 50 40 30 20
b 15 20 25 30
c 1 2 3 4
total
Here is a sample code I created:
data dsstabl;
set dsstabl;
order_month = month(AC_DATE_OF_IMAGE_ORDER);
order_year = year(AC_DATE_OF_IMAGE_ORDER);
order = compress(order_month||order_year);
run;
proc freq data
table item * _order;
run;
Thanks in advance!
When you are doing your analysis, use an appropriate format. MONYY. sounds like the right one. That will sort properly and will group values accordingly.
Something like:
proc means data=yourdata;
class datevar;
format datevar MONYY7.;
var whatever;
run;
So your table:
proc tabulate data=dsstabl;
class item datevar;
format datevar MONYY7.;
tables item,datevar*n;
run;
Or you can do it with nice trick.
data my_data_with_months;
set my_data;
MONTH = INTNX('month', NORMAL_DATE, 0, 'B');
run;
Always use it.
Related
I'm trying to merge a dataset to another table (hist_dataset) by applying one condition.
The dataset that I'm trying to merge looks like this:
Label
week_start
date
Value1
Value2
Ac
09Jan2023
13Jan2023
45
43
The logic that I'm using is the next:
If the value("week_start" column) of the first record is equal to today's week + 14 then merge the dataset with the dataset that I want to append.
If the value(week_start column) of the first record is not equal to today's week + 14 then do nothing, don't merge the data.
The code that I'm using is the next:
libname out /"path"
data dataset;
set dataset;
by week_start;
if first.week_start = intnx('week.2', today() + 14, 0, 'b') then do;
data dataset;
merge out.hist_dataset dataset;
by label, week_start, date;
end;
run;
But I'm getting 2 Errors:
117 - 185: There was 1 unclosed DO block.
161 - 185: No matching DO/SELECT statement.
Do you know how can make the program run correctly or do you know another way to do it?
Thanks,
'''
I cannot make heads or tails of what you are asking. So let me take a guess at what you are trying to do and give answer to my guesses.
Let's first make up some dataset and variable names. So you have an existing dateset named OLD that has key variables LABEL WEEK_START and DATE.
Now you have received a NEW dataset that has those same variables.
You want to first subset the NEW dataset to just those observations where the value of DATE is within 14 days of the first value of START_WEEK in the NEW dataset.
data subset ;
set new;
if _n_=1 then first_week=start_week;
retain first_week;
if date <= first_week+14 ;
run;
You then want to merge that into the OLD dataset.
data want;
merge old subset;
by label week_start date ;
run;
I want to use SAS and eg. proc report to produce a custom table within my workflow.
Why: Prior, I used proc export (dbms=excel) and did some very basic stats by hand and copied pasted to an excel sheet to complete the report. Recently, I've started to use ODS excel to print all the relevant data to excel sheets but since ODS excel would always overwrite the whole excel workbook (and hence also the handcrafted stats) I now want to streamline the process.
The task itself is actually very straightforward. We have some information about IDs, age, and registration, so something like this:
data test;
input ID $ AGE CENTER $;
datalines;
111 23 A
. 27 B
311 40 C
131 18 A
. 64 A
;
run;
The goal is to produce a table report which should look like this structure-wise:
ID NO-ID Total
Count 3 2 5
Age (mean) 27 45.5 34.4
Count by Center:
A 2 1 3
B 0 1 1
A 1 0 1
It seems, proc report only takes variables as columns but not a subsetted data set (ID NE .; ID =''). Of course I could just produce three reports with three subsetted data sets and print them all separately but I hope there is a way to put this in one table.
Is proc report the right tool for this and if so how should I proceed? Or is it better to use proc tabulate or proc template or...?
I found a way to achieve an almost match to what I wanted. First if all, I had to introduce a new variable vID (valid ID, 0 not valid, 1 valid) in the data set, like so:
data test;
input ID $ AGE CENTER $;
if ID = '' then vID = 0;
else vID = 1;
datalines;
111 23 A
. 27 B
311 40 C
131 18 A
. 64 A
;
run;
After this I was able to use proc tabulate as suggested by #Reeza in the comments to build a table which pretty much resembles what I initially aimed for:
proc tabulate data = test;
class vID Center;
var age;
keylabel N = 'Count';
table N age*mean Center*N, vID ALL;
run;
Still, I wonder if there is a way without introducing the new variable at all and just use the SAS counters for missing and non-missing observations.
UPDATE:
#Reeza pointed out to use the proc format to assign a value to missing/non-missing ID data. In combination with the missing option (prints missing values) in proc tabulate this delivers the output without introducing a new variable:
proc format;
value $ id_fmt
' ' = 'No-ID'
other = 'ID'
;
run;
proc tabulate data = test missing;
format ID $id_fmt.;
class ID Center;
var age;
keylabel N = 'Count';
table N age*(mean median) Center*N, (ID=' ') ALL;
run;
I have a sas datebase with something like this:
id birthday Date1 Date2
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
And I want the data in this form:
id Date Datetype
1 12/4/01 birthday
1 12/4/13 1
1 12/3/14 2
2 12/3/01 birthday
2 12/6/13 1
2 12/2/14 2
3 12/9/01 birthday
3 12/4/03 1
3 12/9/14 2
4 12/8/13 birthday
4 12/3/14 1
4 12/10/16 2
Thanks by ur help, i'm on my second week using sas <3
Edit: thanks by remain me that i was not finding a sorting method.
Good day. The following should be what you are after. I did not come up with an easy way to rename the columns as they are not in beginning data.
/*Data generation for ease of testing*/
data begin;
input id birthday $ Date1 $ Date2 $;
cards;
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
; run;
/*The trick here is to use date: The colon means everything beginning with date, comparae with sql 'date%'*/
proc transpose data= begin out=trans;
by id;
var birthday date: ;
run;
/*Cleanup. Renaming the columns as you wanted.*/
data trans;
set trans;
rename _NAME_= Datetype COL1= Date;
run;
See more from Kent University site
Two steps
Pivot the data using Proc TRANSPOSE.
Change the names of the output columns and their labels with PROC DATASETS
Sample code
proc transpose
data=have
out=want
( keep=id _label_ col1)
;
by id;
var birthday date1 date2;
label birthday='birthday' date1='1' date2='2' ; * Trick to force values seen in pivot;
run;
proc datasets noprint lib=work;
modify want;
rename
_label_ = Datetype
col1 = Date
;
label
Datetype = 'Datetype'
;
run;
The column order in the TRANSPOSE output table is:
id variables
copy variables
_name_ and _label_
data based column names
The sample 'want' shows the data named columns before the _label_ / _name_ columns. The only way to change the underlying column order is to rewrite the data set. You can change how that order is perceived when viewed is by using an additional data view, or an output Proc that allows you to specify the specific order desired.
Hi does anyone know how to calculate the standard deviation over the next four quarters for each quarter? Thanks :)
My attempt is below:
date1 is the sas date for the quarter in a year
Proc sql ; create table th.totalroll as
Select distinct permco, date1 ,
(select std(adjret) from th.returns1 where qtr between
intnx('quarter',qtr(date),0) and intnx('quarter', qtr(date),+3)) as
TOTALroll From th.returns1 group by permco ,date1;
QUIT;
It's hard to tell how close you are because I'm not entirely certain what your data looks like, but here's an example assuming you have more than one date in each quarter. Create sample data:
data have;
format date date9.;
do m = 1 to 128;
date = intnx('month','01JAN2008'd,m-1);
amount = round(ranuni(date)*10);
output;
end;
drop m;
run;
Using proc sql, create quarter variable (you might already have this variable?) and group by this variable. Use a having clause to restrict results to the first date of each quarter.
proc sql;
create table want as
select
yyq(year(t1.date),qtr(t1.date)) as quarter format=yyq.,
(select std(t2.amount)
from have t2
where t2.date >= yyq(year(t1.date),qtr(t1.date))
and t2.date < intnx('quarter',yyq(year(t1.date),qtr(t1.date)),4)) as stddev
from
have t1
group by
calculated quarter
having
t1.date = min(t1.date)
;
quit;
You should be able to adapt this to work for your data.
You can use proc expand if your dataset is already in quarterly. So something like this:
proc expand data=th.returns1
out=th.totalroll
from=quarter
to=quarter;
by permco date1;
id date;
convert adjret=TOTALroll / transformout=( MOVSTD 4 );
run;
Don't forget to sort you data first. And MOVSTD gives you backward moving standard deviation. You may need to shift the output stream back by 4 quarters if you want the forward moving STD.
Transformation Operations for proc expand:
http://support.sas.com/documentation/cdl/en/etsug/60372/HTML/default/viewer.htm#etsug_expand_sect026.htm
I have a SAS dataset similar to the one created here.
data have;
input date :date. count;
cards;
20APR2012 10
20APR2012 20
20APR2012 20
27APR2012 15
27APR2012 5
;
run;
proc sort data=have;
by date;
run;
I want to create a column containing the sum for each date, so it would look like
date total
20APR2012 50
27APR2012 20
I have tried using first. but I think my syntax is off. Thanks.
This is what proc means is for.
proc means data=have;
class date;
var count;
output out=want sum=total;
run;
The code below works to give you your desired result.
proc sql;
create table wanted_tab as
select
date format date9.,
sum(count) as Total
from have
group by date;
;
quit;