Compute end time of an activity - stata

I have the beginning time of a surgery (new_time) in %tc format and I also have the length of the surgery (th_time) in minutes (e.g. 140 mins) and unformatted.
I would like to know how to proceed to add th_time to new_time in order to get the end time of the surgery.
Here's how I formatted so far my variables:
gen time_temp = substr(intotheatre, strpos(intotheatre," ")+1, . )
gen new_time = clock(intotheatre, "DMY hms",2050)
format new_time %tc
generate hrs=hh(new_time)
generate mins=mm(new_time)
generate secs=ss(new_time)
drop if th_time < 0 | th_time == .
drop if theatre==""
sort theatre new_time
I have read Stata Journals from Nick Cox but every attempt I made trying to generate the end_time ended up with 'type mismatch' from Stata.
Any suggestion would be appreciated!

If you have not already done so, it would benefit you to work your way through the guidance in help datetime, which is without a doubt the most visited documentation on my system, with the second-most-visited being Chapter 24 (Working with dates and times) of the Stata User's Guide PDF available from the PDF Documentation item on Stata's Help menu. Before working with dates and times, any Stata user should read the very detailed Chapter 24 thoroughly. After that, the help documentation will usually be enough to point the way. Some people may be able to remember everything without have to continually refer to the documentation, but I for one am not such a person.
With that said, your new_time variable should be created as a double
generate double new_time = clock(intotheatre, "DMY hms",2050)
as both the cited documentation and Nick Cox advise. You have new_time represented now as the number of milliseconds since 1/1/1960. Then converting the theatre time from minutes to milliseconds and adding
generate double end_time = new_time + ( th_time * 60 * 1000 )
format end_time %tc
should give you what you desire.

Related

Calculate the number of firms at a given month

I'm working on a dataset in Stata
The first column is the name of the firm. the second column is the start date of this firm and the third column is the expiration date of this firm. If the expdate is missing, this firm is still in business. I want to create a variable that will record the number of firms at a given time. (preferably to be a monthly variable)
I'm really lost here. Please help!
Next time, try using dataex (ssc install dataex) rather than a screen shot, this is recommended in the Stata tag wiki, and will help others help you!
Here is an example for how to count the number of firms that are alive in each period (I'll use years, but point out where you can switch to month). This example borrows from Nick Cox's Stata journal article on this topic.
First, load the data:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long(firmID dt_start dt_end)
3923155 20080123 99991231
2913168 20070630 99991231
3079566 20000601 20030212
3103920 20020805 20070422
3357723 20041201 20170407
4536020 20120201 20170407
2365954 20070630 20190630
4334271 20110721 20191130
4334338 20110721 20170829
4334431 20110721 20190429
end
Note that my in my example data my dates are not in Stata format, so I'll convert them here:
tostring dt_start, replace
generate startdate=date(dt_start, "YMD")
tostring dt_end, replace
generate enddate=date(dt_end, "YMD")
format startdate enddate
Next make a variable with the time interval you'd like to count within:
generate startyear = year(startdate)
generate endyear = year(enddate)
In my dataset I have missing end dates that begin with '9999' while you have them as '.' I'll set these to the current year, the assumption being that the dataset is current. You'll have to decide whether this is appropriate in your data.
replace endyear = year(date("$S_DATE","DMY")) if endyear == 9999
Next create an observation for the first and last years (or months) that the firm is alive:
expand 2
by firmID, sort: generate year = cond(_n == 1, startyear, endyear)
keep firmID year
duplicates drop // keeps one observation for firms that die in the period they were born
Now expand the dataset to have an observation for every period between the start and end date. For this I use tsfill.
xtset firmID year
tsfill
Now I have one observation per existing firm in each period. All that remains is to count the observations by year:
egen entities = count(firmID), by(year)
drop firmID
duplicates drop

xline option when date is formatted %th?

I'm doing a connected twoway plot with x-axis as dates formatted as %th with values 2011h1 to 2017h2. I want to put a vertical line at 2016h2 but nothing I've tried has worked.
xline(2016h2)
xline("2016h2")
xline(date==2016h2)
xline(date=="2016h2")
I'm thinking it might be because I formatted dates with
gen date = yh(year, half)
format date %th
I think this is a MWE:
age1820 date
10.42 2011h1
10.33 2011h2
11.66 2012h1
11.01 2012h2
14.29 2013h1
10.95 2013h2
12.42 2014h1
7.04 2014h2
7.07 2015h1
6.95 2015h2
4 2016h1
8.07 2016h2
5.98 2017h1
3.19 2017h2
graph twoway connected age1820 date, xline(2016h2)
Your example will not really work as written without some additional work. I think in future posts you may want to shoot for a fully working example to maximize the chance that you get a good answer quickly. This is why I made up some fake data below.
Try something like this:
clear
set obs 20
gen date = _n + 100
format date %th
gen age = _n*2
display %th 116
display %th 117
tw connected age date, xline(116 `=th(2018h2)') tline(2019h1)
The crux of the matter is that Stata deals with dates as integers that have a special label attached to them by the format command (but not a value label). For example, 0 corresponds to 1960h1. In other words, you need to either:
tell xline() the number that corresponds to the date you want
use th() to figure out what that number is and force the evaluation inside xline().
use tline(), which is smart enough to understand dates.
I think the third is the best option.

How to destring a date in Stata containing just the year?

I have a string variable in Stata called YEAR with format "aaaa" (e.g. 2011). I want to replace "aaaa" with "31decaaaa" and destring the obtained variable.
My feeling is that the best way to proceed could be firstly destringing the variable YEAR and then adding "31dec". To destring the variable YEAR I have tried the command date but it does not seem to work. Any suggestion?
It would be best to describe your eventual goal here, as use of destring just appears to be what you have in mind as the next step.
If your goal is, given a string variable year, to produce a daily date variable for 31 December in each year, then destring is not necessary. Here are three ways to do it:
gen date = daily("31 Dec" + year, "DMY")
gen date = date("31 Dec" + year, "DMY")
gen date = mdy(12, 31, real(year))
Incidentally, there is no likely gain for Stata use in daily dates 365 or 366 days apart, as they just create a time series that is mostly implicit gaps.
If your data are yearly, but just associated with the end of each calendar year, keep them as yearly and use a display format to show "31 Dec", or the equivalent, in output.
. di %ty!3!1_!D!e!c_CCYY 2015
31 Dec 2015
Detail. date() is a function, not a command, in Stata. We can't comment on "does not seem to work" as no details are given of what you tried or what happened. daily() is just a synonym for date().

In Stata, how can I change the format of a date from "2010-01-11 00:00:00" to "1/11/2010"?

I am currently trying to change the format of a date from "2010-01-11 00:00:00" to "01-11-2010" or "1/11/2010". Currently "2010-01-11 00:00:00" is in a string format. I have tried to coerce using the date() function but it never returns to the point where Stata can recognize and sort. Would anyone have any idea how to do this?
It's best if for future questions you post attempted code and why it's not working for you.
Maybe this works in your case:
clear all
set more off
*----- example data -----
set obs 1
gen dat = "2010-01-11 00:00:00"
describe
list
*----- what you want -----
gen double dat2 = clock(dat, "YDM hms")
format dat2 %tcDD-NN-YY
describe
list
Note that we go from string type to numeric type (double), and then adjust the display format.
See help format, help datetime and help datetime_display_formats.
Read also:
Stata tip 113: Changing a variable's format: What it does and does not mean
N. J. Cox. 2012.
Stata Journal Volume 12 Number 4.
http://www.stata-journal.com/article.html?article=dm0067
If you are ingesting time data in "2010-01-11 00:00:00" (SQL) format, then by default it is ingested into Stata as a str23
If you would like it as a Stata date format to manipulate, you could try the following (ingested_date_1 ... being your date columns)
foreach sqltime in ingested_date_1 ingested_date_2 {
rename `sqltime' X
generate double `sqltime' = clock(X, "YMD hms")
drop X
format %tcDDmonCCYY_HH:MM:SS `sqltime'
}
This, takes in multiple "dates", just replace your column names with ingested_date_1 ingested_date_2 etc and reformats them and keeps their 'original' name
Now the dates are in a stata recognised time format, %tc based of the clock, this will be sorted in the time-sense like you expect, rather than the ingested string which was not.
Additionally you may now reformat the display of the date to something that you would like or are comfortable reading, although it will make no difference to date manipulation, it is just the displayed appearance, in the case of viewing as "01-11-2010"
as Roberto says
format ingetsed_date_i %tcDD-NN-YY

Count number of living firms efficiently

I have a list of companies with start and end dates for each. I want to count the number of companies alive over time. I have the following code but it runs slowly on my large dataset. Is there a more efficient way to do this in Stata?
forvalues y = 1982/2012 {
forvalues m = 1/12 {
*display date("01-`m'-`y'","DMY")
count if start_dt <= date("01-`m'-`y'","DMY") & date("01-`m'-`y'","DMY") <= end_dt
}
}
One way is to use the inrange function. In Stata, Date variables are just integers so you can easily operate on them.
forvalues y = 1982/2012 {
forvalues m = 1/12 {
local d = date("01-`m'-`y'","DMY")
count if inrange(`d', start_dt, end_dt)
}
}
This alone will save you a huge amount of time. For 50.000 observations (and made-up data):
. timer list 1
1: 3.40 / 1 = 3.3980
. timer list 2
2: 18.61 / 1 = 18.6130
timer 1 is with inrange, timer 2 is your original code. Results are in seconds. Run help inrange and help timer for details.
That said, maybe someone can suggest an overall better strategy.
Assuming a firm identifier firmid, this is another way to think about the problem, but with a different data structure. Make sure you have a saved copy of your dataset before you do this.
expand 2
bysort firmid : gen eitherdate = cond(_n == 1, start_dt, end_dt)
by firmid : gen score = cond(_n == 1, 1, -1)
sort eitherdate
gen living = sum(score)
by eitherdate : replace living = living[_N]
So,
We expand each observation to 2 and put both dates in a new variable, the start date in one observation and the end date in the other observation.
We assign a score that is 1 when a firm starts and -1 when it ends.
The number of firms is increased by 1 every time a firm starts and decreased by 1 every time one ends. We just need to sort by date and the number of firms is the cumulative sum of those scores. (EDIT: There is a fix for changes on the same date.)
This new data structure could be useful for other purposes.
There is a write-up at http://www.stata-journal.com/article.html?article=dm0068
EDIT:
Notes in response to #Roberto Ferrer (and anyone else who read this):
I fixed a bad bug, which made this too difficult to understand. Sorry about that.
The dates used here are just the dates at which firms start and end. There is no evident point in evaluating the number of firms at any other date as it would just be the same number as the previous date used. If you needed, however, to interpolate to a grid of dates, copying the previous count would be sufficient.
It is important not to confuse the Stata function sum() which returns the cumulative sum with any egen function. The impression that egen's total() is an alternative here was a side-effect of my bug.