I have a string variable in Stata called YEAR with format "aaaa" (e.g. 2011). I want to replace "aaaa" with "31decaaaa" and destring the obtained variable.
My feeling is that the best way to proceed could be firstly destringing the variable YEAR and then adding "31dec". To destring the variable YEAR I have tried the command date but it does not seem to work. Any suggestion?
It would be best to describe your eventual goal here, as use of destring just appears to be what you have in mind as the next step.
If your goal is, given a string variable year, to produce a daily date variable for 31 December in each year, then destring is not necessary. Here are three ways to do it:
gen date = daily("31 Dec" + year, "DMY")
gen date = date("31 Dec" + year, "DMY")
gen date = mdy(12, 31, real(year))
Incidentally, there is no likely gain for Stata use in daily dates 365 or 366 days apart, as they just create a time series that is mostly implicit gaps.
If your data are yearly, but just associated with the end of each calendar year, keep them as yearly and use a display format to show "31 Dec", or the equivalent, in output.
. di %ty!3!1_!D!e!c_CCYY 2015
31 Dec 2015
Detail. date() is a function, not a command, in Stata. We can't comment on "does not seem to work" as no details are given of what you tried or what happened. daily() is just a synonym for date().
Related
I'm working on a dataset in Stata
The first column is the name of the firm. the second column is the start date of this firm and the third column is the expiration date of this firm. If the expdate is missing, this firm is still in business. I want to create a variable that will record the number of firms at a given time. (preferably to be a monthly variable)
I'm really lost here. Please help!
Next time, try using dataex (ssc install dataex) rather than a screen shot, this is recommended in the Stata tag wiki, and will help others help you!
Here is an example for how to count the number of firms that are alive in each period (I'll use years, but point out where you can switch to month). This example borrows from Nick Cox's Stata journal article on this topic.
First, load the data:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long(firmID dt_start dt_end)
3923155 20080123 99991231
2913168 20070630 99991231
3079566 20000601 20030212
3103920 20020805 20070422
3357723 20041201 20170407
4536020 20120201 20170407
2365954 20070630 20190630
4334271 20110721 20191130
4334338 20110721 20170829
4334431 20110721 20190429
end
Note that my in my example data my dates are not in Stata format, so I'll convert them here:
tostring dt_start, replace
generate startdate=date(dt_start, "YMD")
tostring dt_end, replace
generate enddate=date(dt_end, "YMD")
format startdate enddate
Next make a variable with the time interval you'd like to count within:
generate startyear = year(startdate)
generate endyear = year(enddate)
In my dataset I have missing end dates that begin with '9999' while you have them as '.' I'll set these to the current year, the assumption being that the dataset is current. You'll have to decide whether this is appropriate in your data.
replace endyear = year(date("$S_DATE","DMY")) if endyear == 9999
Next create an observation for the first and last years (or months) that the firm is alive:
expand 2
by firmID, sort: generate year = cond(_n == 1, startyear, endyear)
keep firmID year
duplicates drop // keeps one observation for firms that die in the period they were born
Now expand the dataset to have an observation for every period between the start and end date. For this I use tsfill.
xtset firmID year
tsfill
Now I have one observation per existing firm in each period. All that remains is to count the observations by year:
egen entities = count(firmID), by(year)
drop firmID
duplicates drop
I want to create a variable with the age of credit. The data only has the date of the start of credit.
I create date variable (eg 2017-12-31) for default. Then, i want calculate age with date of the start credit.
I tried to search for an article about that, but no luck.
Thanks.
It seems like your data is daily. In that case what you need is:
gen current_date=date("20171231","YMD")
format current_date %td //this will be the variable from which age will be calculated
gen age=current_date-start_credit_date //again, assuming the start credit variable is daily
this gives the age variable as the number of days. If you want it as the number of months, you need to add:
gen current_month=mofd(current_date)
format current_month %tm
gen start_month=mofd(start_credit_date)
format start_month %tm
gen age_month=current_month-start_month
Hi to all and good time of a day!
Here is my case I need to solve I will very gratefull if you can help me.
I have some data set it contains only one variable date format.
Example:
01JAN2016
06JAN2016
15FEB2016
The second data set is days - holidays for a period 5 years.
Example:
01JAN2016
02JAN2016
and etc, all these days are not working days.
The case is I need to count number of working days from date for every observation from first data set till now. It seems that I need to count number of days
"Now date" minus Date(from first data set) and minus number of days from second data set with holidays (count(date) where Date(from first data set)< date < "Now"
You can define your own type of interval to use with SAS funcions intck and intnx. Here's how to do it:
First create a table of weekdays for whichever years you have holidays for, up to present (or a future) year.
Here we'll start by including all weekdays from 2014 to 2016. This is assuming you don't want to count weekend days. If that's not the case, just modify the code so that the condition "weekday(date) in (2:6)" is not applied. You'll get the full 365 days of the year.
data mon_fri;
do date = "01JAN2014"d to "31DEC2016"d;
if weekday(date) in (2:6) then output;
end;
format date date9.;
run;
Then we'll create a table having all those dates we just created, minus the holidays we have in the table Holidays. We'll place the table in a library called myLib, and rename the date column to "Begin" for compliance with SAS custom intervals.
libname myLib "some/place/on/your/drive";
data mylib.workdays(RENAME=(date=Begin));
merge mon_fri (in=weekday)
Holidays (in=holiday);
by date;
if weekday and not holiday then output;
run;
Now we set up a custom interval which we'll simply call "workdays".
options intervalds=(workdays=mylib.workdays);
From there, all you have left to do is something like this:
data dateCalculations;
set mydata;
numOfDays = intck("workdays", theDate, today());
run;
SAS will take care of counting the number of dates (lines in the workdays dataset) separating the startdate (column called theDate) from the enddate (today's date).
Et voilĂ !
This is wonderful and very helpful. I use two different SAS systems (both on remote Unix servers). Setting the intervalds option only seems to work on one of them. I copy/paste the same code and on the other nothing happens - no warning, no error, it simply doesn't work.
Here is how I'm setting it (download the CSV from Yahoo! Finance for the S&P500, daily data, starting January 1950):
PROC IMPORT DATAFILE="sp500_1950_2016.csv"
OUT=sp500_1950_2016
DBMS=DLM
REPLACE;
delimiter=',';
getnames=yes;
RUN;
data trading_days;
set sp500_1950_2016 (keep = date rename=(date=begin));
where year(begin) < 2017;
run;
options intervalds=(TradingDay=trading_days) ;
Then I call it like so to count number of observations I should have from fund inception to Dec 31, 2016 or when the fund closed, whichever is sooner:
data ops2; set operations_master; where ~missing(inception);
if missing(enddate) then enddate = '31dec2016'd;
datadays = INTCK('TradingDay',inception,enddate);run;
proc univariate; var datadays;run;quit;
On system 1, this works just fine. On system 2, I get 0 for the variable datadays. I've already checked to see if there is a sys admin override on setting the intervalds option, and there is not. Is there another reason why this might not work on a given system?
Does anybody knows the SAS function, which could return the year days count by the given date?
For example:
input value - date,
output value - days in the year(date)
I took the example from the docs and edited it to match your request
data _null;
startdate='01jan14'd;
enddate=DATE();
actual=datdif(startdate, enddate, 'act/act');
put actual;
run;
This should output 160 because that is the amount of days between today and Jan 1 (not counting today)
The start date is Jan 1 '14
The end date is DATE() which is 'today'
This uses the datdif function that is explained in detail in the link.
Since SAS dates are just numbers...you can manipulate them by simply adding and subtracting them.
Data _null_;
Diff= today() - '01Jan2014'd;
Put Diff=;
Run;
Returns 160 in the log.
I have three columns for each observation indicating day, month, and year. I would like to combine these three columns into one to indicate the date. I will later need to add a day or subtracts a day for observations, so I need to make sure the format of the new date column is date.
I'm programming this in Stata. I used mdy function, but it generates empty observations. And I'm not sure why it is not working.
Here is a snapshot of the data:
Stata function mdy should work if the format of "year" is changed from yy to yyyy. Thus we can add 1900 to the "year" variable and then use the function mdy to generate a new variable that indicates the date using three variables "day", "month", and "year". The following code shows how to use mdy function:
replace year = 1900+year
gen date = mdy(month,day,year)