I have a database like this. This corresponds to a single person and I have this type of data for multiple persons.
data test;
input date YYMMDD10. real_length min_length;
format date YYMMDD10.;
cards;
2000-02-23 1 7
2000-02-24 12 15
2000-03-07 15 7
2000-03-22 7 15
2000-03-29 13 7
2000-04-11 17 7
2000-04-28 . 7
run;
What I am looking for is : if the interval between 2 dates in consecutive lines (real_length) is inferior to a certain length (min_length), I want to replace the date in the next line by the previous date + min_length. So far, this is not a problem and here is the code I used to achieve it:
data test2;
set test;
format lagdate min_date YYMMDD10.;
retain lagmin lagdate;
if lag(real_length) < lag(min_length) and lag(real_length) ~= . then min_date = lagdate + lagmin;
else min_date = date;
lagdate = min_date;
lagmin = min_length;
run;
Which gives :
date min_date min_length
2000-02-23 2000-02-23 7
2000-02-24 2000-03-01 15
2000-03-07 2000-03-16 7
2000-03-22 2000-03-22 15
...
The problem is that now the interval between 2 consecutive dates could become less than the minimal length, e.g. : 2000-03-22 - 2000-03-16 = 6 days < min_length = 7. And I would like to have 2000-03-23 = 2000-03-16 + 7 (=min_length) instead of 2000-02-22 like this:
date min_date min_length
2000-02-23 2000-02-23 7
2000-02-24 2000-03-01 15
2000-03-07 2000-03-16 7
2000-03-22 2000-03-23 15
...
So I've tried this code, but it does not work... I believe the problem could be in the if condition.
data test2;
set test;
format lagdate min_date YYMMDD10.;
retain lagmin lagdate;
if (lag(real_length) < lag(min_length) and lag(real_length) ~= .) or (adjust_length < lag(min_length) and adjust_length ~=.) then min_date = lagdate + lagmin;
else min_date = date;
adjust_length = min_date - lagdate;
lagdate = min_date;
lagmin = min_length;
run;
Does anybody see why this isn't working or do you hve another way of doing this?
Thank you!
The problem is that each time you adjust one date, you have to move all the subsequent dates as well if they're bunched up together. I think you can do this by keeping a running total of how many days you've added on to all the previous rows and then adding on only what's needed after that to get to the min_length between dates:
data want;
set test;
format t_min_date min_date yymmdd10.;
if _n_ = 1 then total_adj = 0;
t_min_date = date + min_length + total_adj;
min_date = lag1(t_min_date);
total_adj + max(0,min_length - real_length);
run;
Is that what you were aiming for?
N.B. you'll need to replace the if _n_ = 1 with some first.id and last.id logic to make this work for multiple individuals in the same dataset.
Related
I have a dataset like this:
DATA tmp;
INPUT
identifier $
d0101 d0102 d0103 d0104 d0105 d0106
d0107 d0108 d0109 d0110 d0111 d0112
;
DATALINES;
a 1 2 3 4 5 6 7 8 9 10 11 12
b 4 5 7 4 5 6 7 6 9 10 3 12
c 5 2 3 5 5 4 7 8 3 1 1 2
;
RUN;
And I'm trying to create a dataset like this:
DATA tmp;
INPUT
identifier $ day value
;
DATALINES;
a '01JAN2018'd 1
a '02JAN2018'd 2
a '03JAN2018'd 3
a '04JAN2018'd 4
a '05JAN2018'd 5
a '06JAN2018'd 6
a '07JAN2018'd 7
a '08JAN2018'd 8
a '09JAN2018'd 9
a '10JAN2018'd 10
a '11JAN2018'd 11
a '12JAN2018'd 12
b '01JAN2018'd 4
b '02JAN2018'd 5
b '03JAN2018'd 7
...
;
RUN;
I know the syntax for "melting" a dataset like this - I have completed a similar macro for columns that represent a particular value in each of the twelve months in a year.
What I'm struggling with is how to iterate through all days year-to-date (the assumption is that the have dataset has all days YTD as columns).
I'm used to Python, so something I might do there would be:
>>> import datetime
>>>
>>> def dates_ytd():
... end_date = datetime.date.today()
... start_date = datetime.date(end_date.year, 1, 1)
... diff = (end_date - start_date).days
... for x in range(0, diff + 1):
... yield end_date - datetime.timedelta(days=x)
...
>>> def create_date_column(dt):
... day, month = dt.day, dt.month
... day_fmt = '{}{}'.format('0' if day < 10 else '', day)
... month_fmt = '{}{}'.format('0' if month < 10 else '', month)
... return 'd{}{}'.format(month_fmt, day_fmt)
...
>>> result = [create_date_column(dt) for dt in dates_ytd()]
>>>
>>> result[:5]
['d1031', 'd1030', 'd1029', 'd1028', 'd1027']
>>> result[-5:]
['d0105', 'd0104', 'd0103', 'd0102', 'd0101']
Here is my SAS attempt:
%MACRO ITER_DATES_YTD();
DATA _NULL_;
%DO v_date = '01012018'd %TO TODAY();
%PUT d&v_date.;
* Will do "melting" logic here";
%END
%MEND ITER_DATES_YTD;
When I run this, using %ITER_DATES_YTD();, nothing is even printed to my log. What am I missing here? I basically want to iterate through "YTD" columns, like these d0101, d0102, d0103, ....
This is more a transposition problem than a macro / data step problem.
The core problem is that you have data in the metadata, meaning the 'date' is encoded in the column names.
Example 1:
Transpose the data, then use the d<yymm> _name_ values to compute an actual date.
proc transpose data=have out=have_t(rename=col1=value);
by id;
run;
data want (keep=id date value);
set have_t;
* convert the variable name has day-in-year metadata into some regular data;
date = input (cats(year(today()),substr(_name_,2)),yymmdd10.);
format date yymmdd10.;
run;
Example 2:
Do an array based transposition. The D<mm><dd> variables are being used in a role of value_at_date, and are easily arrayed due to a consistent naming convention. The VNAME function extricates the original variable name from the array reference and computes a date value from the <mm><dd> portion
data want;
set have;
array value_at_date d:;
do index = 1 to dim(value_at_date);
date = input(cats(year(today()),substr(VNAME(value_at_date(index)),2)), yymmdd10.);
value = value_at_date(index);
output;
end;
format date yymmdd10.;
keep id date value;
run;
To iterate through dates, you have to convert it to numbers first and then extract date part from it.
%macro iterateDates();
data _null_;
%do i = %sysFunc(inputN(01012018,ddmmyy8.)) %to %sysFunc(today()) %by 1;
%put d%sysFunc(putN(&i, ddmmyy4.));
%end;
run;
%mend iterateDates;
%iterateDates();
I think that '01012018'd is processed only in data step, but not in the macro code. And keep in mind, that macro code is executed first and only then the data step is executed. You can think about it like building SAS code with SAS macros and then running it.
Hi I am beginner is sas and I need help for this question.
I want to convert 201711 to 13th Nov 2017. I cannot understand this tricky questions.
Please help and thanks in advance.
If this is for display purpose and assuming all date values are in the same format as that in your question then this should work.
First you create format to display the months:
proc format lib=work;
value mon
1 = "Jan"
2 = "Feb"
3 = "Mar"
4 = "Apr"
5 = "May"
6 = "Jun"
7 = "Jul"
8 = "Aug"
9 = "Sep"
10 = "Oct"
11 = "Nov"
12 = "Dec"
;
run;
Then you substring the month and the year from your date variable and then apply the formats.
data have;
length full_date $20;
date = 201711;
mon = input(substrn(date,5,2),best.);
yr = input(substrn(date,1,4),best.);
full_date = compbl(put(mon,mon.)||put(yr,best.));
run;
If '201711' is just some text you have to convert, then it seems the day number is missing so it will have to be added. SAS treats dates as numbers, so it is useful to convert text dates to a SAS date format. The date can then be reformatted:
data want;
have = '201711'; /* given partial date */
add_day = '13'; /* day of month to add */
full_dt = cats(have,add_day); /* join day to partial date */
num_dt = input(full_dt,yymmdd8.); /* convert to a SAS date */
text_dt = put(num_dt,date9.); /* format as desired */
run;
As you are new to SAS I have commented what each part is doing, but it would be more useful for you to understand date handling / processing in SAS, e.g. the following is a useful start:
http://support.sas.com/documentation/cdl/en/lrcon/65287/HTML/default/viewer.htm#p1wj0wt2ebe2a0n1lv4lem9hdc0v.htm
I'm trying to manipulate my Dispensing_Date to give me the weeknum of the year ending on last Friday for each Date, can this be done? Here is what I have so far...
%let 1= 01012016;
%let 53 = 12302016;
**01 import whiteoak file;
proc import
datafile = "E:\Horizon\Adhoc\AH\whiteoak.xlsx"
out = whiteoak
dbms = XLSX
replace;
run;
** 02 remove dupes to ensure unique rx and fill;
proc sort nodup data=whiteoak;
by Rx_ Refill;
run;
** 03 Filter out holds;
data whiteoak;
set whiteoak;
where (Filled_Status="YES");
run;
** 04 create weekday variable;
data dates;
set whiteoak;
format Dispensing_Date MMDDYY8.;
run;
This is my best guess as to what you are asking.
24 data _null_;
25 x = today();
26 d = intnx('week.7',x,-1,'end');
27 put (_all_)(=weekdate.);
28 run;
x=Wednesday, January 4, 2017 d=Friday, December 30, 2016
Does this do what you want?
data weeks;
do date = '22DEC2016'd to '15JAN2017'd;
format date first_friday weekdate.;
sas_week=week(date);
first_friday= intnx('week.7',intnx('year',date,0,'b'),0,'e');
friday_week=1+int((7+date-first_friday)/7) ;
output;
end;
run;
If it does then apply it to your data:
data dates;
set whiteoak;
week = 1 + int((7+Dispensing_Date
- intnx('week.7',intnx('year',Dispensing_Date,0,'b'),0,'e'))/7);
run;
I have monthly data with several observations per day. I have day, month and year variables. How can I retain data from only the first and the last 5 days of each month? I have only weekdays in my data so the first and last five days of the month changes from month to month, ie for Jan 2008 the first five days can be 2nd, 3rd, 4th, 7th and 8th of the month.
Below is an example of the data file. I wasn't sure how to share this so I just copied some lines below. This is from Jan 2, 2008.
Would a variation of first.variable and last.variable work? How can I retain observations from the first 5 days and last 5 days of each month?
Thanks.
1 AA 500 B 36.9800 NH 2 1 2008 9:10:21
2 AA 500 S 36.4500 NN 2 1 2008 9:30:41
3 AA 100 B 36.4700 NH 2 1 2008 9:30:43
4 AA 100 B 36.4700 NH 2 1 2008 9:30:48
5 AA 50 S 36.4500 NN 2 1 2008 9:30:49
If you want to examine the data and determine the minimum 5 and maximum 5 values then you can use PROC SUMMARY. You could then merge the result back with the data to select the records.
So if your data has variables YEAR, MONTH and DAY you can make a new data set that has the top and bottom five days per month using simple steps.
proc sort data=HAVE (keep=year month day) nodupkey
out=ALLDAYS;
by year month day;
run;
proc summary data=ALLDAYS nway;
class year month;
output out=MIDDLE
idgroup(min(day) out[5](day)=min_day)
idgroup(max(day) out[5](day)=max_day)
/ autoname ;
run;
proc transpose data=MIDDLE out=DAYS (rename=(col1=day));
by year month;
var min_day: max_day: ;
run;
proc sql ;
create table WANT as
select a.*
from HAVE a
inner join DAYS b
on a.year=b.year and a.month=b.month and a.day = b.day
;
quit;
/****
get some dates to play with
****/
data dates(keep=i thisdate);
offset = input('01Jan2015',DATE9.);
do i=1 to 100;
thisdate = offset + round(599*ranuni(1)+1); *** within 600 days from offset;
output;
end;
format thisdate date9.;
run;
/****
BTW: intnx('month',thisdate,1)-1 = first day of next month. Deduct 1 to get the last day
of the current month.
intnx('month',thisdate,0,"BEGINNING") = first day of the current month
****/
proc sql;
create table first5_last5 AS
SELECT
*
FROM
dates /* replace with name of your data set */
WHERE
/* replace all occurences of 'thisdate' with name of your date variable */
( intnx('month',thisdate,1)-5 <= thisdate <= intnx('month',thisdate,1)-1 )
OR
( intnx('month',thisdate,0,"BEGINNING") <= thisdate <= intnx('month',thisdate,0,"BEGINNING")+4 )
ORDER BY
thisdate;
quit;
Create some data with the desired structure;
Data inData (drop=_:); * froget all variables starting with an underscore*;
format date yymmdd10. time time8.;
_instant = datetime();
do _i = 1 to 1E5;
date = datepart(_instant);
time = timepart(_instant);
yy = year(date);
mm = month(date);
dd = day(date);
*just some more random data*;
letter = byte(rank('a') +floor(rand('uniform', 0, 26)));
*select week days*;
if weekday(date) in (2,3,4,5,6) then output;
_instant = _instant + 1E5*rand('exponential');
end;
run;
Count the days per month;
proc sql;
create view dayCounts as
select yy, mm, count(distinct dd) as _countInMonth
from inData
group by yy, mm;
quit;
Select the days;
data first_5(drop=_:) last_5(drop=_:);
merge inData dayCounts;
by yy mm;
_newDay = dif(date) ne 0;
retain _nrInMonth;
if first.mm then _nrInMonth = 1;
else if _newDay then _nrInMonth + 1;
if _nrInMonth le 5 then output first_5;
if _nrInMonth gt _countInMonth - 5 then output last_5;
run;
Use the INTNX() function. You can use INTNX('month',...) to find the beginning and ending days of the month and then use INTNX('weekday',...) to find the first 5 week days and last five week days.
You can convert your month, day, year values into a date using the MDY() function. Let's assume that you do that and create a variable called TODAY. Then to test if it is within the first 5 weekdays of last 5 weekdays of the month you could do something like this:
first5 = intnx('weekday',intnx('month',today,0,'B'),0) <= today
<= intnx('weekday',intnx('month',today,0,'B'),4) ;
last5 = intnx('weekday',intnx('month',today,0,'E'),-4) <= today
<= intnx('weekday',intnx('month',today,0,'E'),0) ;
Note that those ranges will include the week-ends, but it shouldn't matter if your data doesn't have those dates.
But you might have issues if your data skips holidays.
I have 5 columns .The columns are
date
stock[a,b,c,d,.]
qty_in[fixed number as in 10 qty came in for the stock on 1/1/2015]
qty_out[ went out /or got sold]
final_qty(qty_in -qty_out)
There are over 100 stocks and transaction for over 6 months duration,thus for the stocks on each day[for example,qty_in on 2/1/2015 is 10 then it should display the value of qty_in as sum of qty_in on 2/1/2015 +final_qty on 1/1/2015]for the same stock ] . How can i achieve this with sas.
Run this in sas
data testfile;
input date $ 1-10 stock $ 11-16 qty_in $17-20 qty_out $21-23 final_qty $24-26;
datalines;
1/1/2015 a 10 0 10
1/1/2015 b 20 4 16
1/1/2015 c 32 23 9
2/1/2015 a 10 /*this value should be= qty_in(2/1/2015 + final_qty 1/1/2015 i.e. 10+10=20*/
2/1/2015 b 20 /*this should be 20+16=36*/
2/1/2015 c 32
;
if you want to do this in a data step you first need to sort the data set by stock and by date. Also, start with just 4 columns and will compute the final col in the data set:
data stockout5;
set stockin4;
retain FIN_QTY;
by stock date;
if (first.stock) then FIN_QTY = INQTY - OUTQTY;
else FIN_QTY = FIN_QTY + INQTY - OUTQTY;
run;
let me know if this works for you. If you supply some test data with what you are starting with and what you want to end up with it would help. Your question is fine but it's not very clear unless you've worked with financial data before (imo)
From start to finish this should do what you're looking for. It's pretty straight forward let me know if you don't understand something. Note that 0 is added in for missing out values.
Data stock4;
format date date9.;
date = '1jan2015'd;
stock = "a";
in = 10;
out = 0 ;
output;
date = "1jan2015"d;
stock = "b";
in = 20;
out = 4;
output;
date = "1jan2015"d;
stock ="c";
in =32;
out=23;
output;
date="2jan2015"d;
stock = "a";
in = 10;
out=0;
output ;
date="2jan2015"d;
stock ="b";
in = 20;
out=0;
output;
date ="2jan2015"d;
stock = "c";
in=32;
out=0;
output;
run;
proc sort data=stock4;
by stock date;
run;
data stock5;
set stock4;
retain FIN_QTY;
by stock date;
if (first.stock) then FIN_QTY = IN - OUT;
else FIN_QTY = FIN_QTY + IN - OUT;
run;