In Stata, how do I convert date in the form of:
09mar2005 00:00:00
to a month-year variable?
If it matters, the date format is %tc.
What I have in mind is to plot monthly averages (instead of the daily average I have) of variables across time.
To get where you are now, you or somebody else may have done something like this:
clear
set obs 1
gen earlier = "09mar2005 00:00:00"
gen double nowhave = clock(earlier, "DMY hms")
format nowhave %tc
list
+-----------------------------------------+
| earlier nowhave |
|-----------------------------------------|
1. | 09mar2005 00:00:00 09mar2005 00:00:00 |
+-----------------------------------------+
Note that a string date and a numeric date-time variable with appropriate date-time format %tc just look the same when you list them, but they are quite different beasts.
To get where you want to be -- with a monthly date -- you convert from clock (date-time) to daily to monthly:
gen mdate = mofd(dofc(nowhave))
format mdate %tm
list
+--------------------------------------------------+
| earlier nowhave mdate |
|--------------------------------------------------|
1. | 09mar2005 00:00:00 09mar2005 00:00:00 2005m3 |
+--------------------------------------------------+
All is documented at help datetime. The function names stand for month of daily date and daily date of clock.
Related
Objective: I would like obtain the difference between current and previous sessions based on date slicers
I want the output to be 4 columns as such:
Date
Current Sessions (see measure below)
Previous Sessions (see measure below)
Difference (no measure calculated yet).
Situation:
I currently have two measures
Current Sessions: SUM(Sales[Sessions])
Previous Sessions (thanks to #Alexis Olson):
VAR datediffs = DATEDIFF(
CALCULATE (MAX ( 'Date'[Date] ) ),
CALCULATE (MAX ('Previous Date'[Date])),
DAY
)
RETURN
CALCULATE(SUM(Sales[Sessions]),
USERELATIONSHIP('Previous Date'[Date],'Date'[Date]),
DATEADD('Date'[Date],datediffs,DAY)
)
I have three tables.
Sales
Date
Previous Date (carbon copy of Date table)
My previous date table is 1:1 inactive relationship with the Date table. Date table is 1 to many active relationship
with my Sales Table.
I have two slicers at all time comparing the same amount of days from different time periods (e.g. Jan 1th to Jan 7th 2019 vs Dec 25st to Dec 31th 2019)
If i put current sessions, previous sessions and a date column from any of the three tables
+----------+------------------+-------------------+------------+
| date | current sessions | previous sessions | difference |
+----------+------------------+-------------------+------------+
| Jan 8th | 10000 | 70000 | 3000 |
| Jan 9th | 20000 | 10000 | 10000 |
| Jan 10th | 15000 | 16000 | -1000 |
| Jan 11th | 14000 | 12000 | 2000 |
| Jan 12th | 12000 | 14000 | -2000 |
| Jan 13th | 11000 | 16000 | -5000 |
| Jan 14th | 15000 | 18000 | -3000 |
+----------+------------------+-------------------+------------+
When I put the Sessions date on the table along with sessions and previous sessions, I get the sessions amounts right for each day but the previous session amounts doesn't calculate correctly I assume because its being filtered by the date rows.
How can I override that table filter and force it to get the exact previous sessions amounts? Basically have both results appended to each other.The following shows my problem. the previous session is the same on each day and is basically the amount of dec 31st jan 2018 because the max date is different for each row but I want it to be based on the slicer.
The mistake came in the first part of the VAR Datediffs variable within the previous session formula:
CALCULATE(LASTDATE('Date'[Date]),ALLSELECTED('Date'))
This forces to always calculate the last day for each row and overrides the date value in each row.
I have a variable date like this:
I want to calculate how many days have passed since, say, Jan 1 of 1960.
However, this is tedious. Also in some years, February has 28 days.
What I've been trying is basically looking up every single calendar, calculate how many days are there in each year, recognize string like jan as month variable 1 and so on.
Is there any short and efficient way to do this?
You need to use the daily() or date() function:
display date("1/1/2012", "DMY") - date("1/1/1960", "DMY")
18993
More generally, if you have a string variable with dates:
clear
input str10 date1
"01/01/2012"
"01/01/2011"
"01/01/2014"
"19/12/2014"
end
generate date2 = date(date, "DMY") - date("1/1/1960", "DMY")
list
+--------------------+
| date1 date2 |
|--------------------|
1. | 01/01/2012 18993 |
2. | 01/01/2011 18628 |
3. | 01/01/2014 19724 |
4. | 19/12/2014 20076 |
+--------------------+
If the variable containing the dates is numeric:
clear
input date1
18993
18628
19724
20076
end
format %tdDD/NN/CCYY date1
generate date2 = date1 - date("1/1/1960", "DMY")
I have a data set that has data sorted by months and years. I want to destring the month variable so that I can ultimately create one date variable, but as they are all labeled as January, February, etc. how do I destring the variable?
You don't. That's a job for date functions. All are documented, e.g. via help datetime.
destring is for numbers that happen to be read as string variables so that typical entries might be "42" and "666". Import as string usually arises when the variable includes metadata (e.g. header lines), or non-Stata flags for missings (e.g. "NA"), or some other non-numeric characters, often in as few as one observation. Import from MS Excel is a common cause, as spreadsheet users tend to be loose on sprinkling text in numeric data columns.
A variable with values such as "January" doesn't qualify. It's in your mind that month names map on to month numbers, but destring doesn't share that knowledge.
Date functions have this job:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str8 month float year
"January" 2017
"February" 1942
end
gen mdate = monthly(month + string(year), "MY")
list
+-------------------------+
| month year mdate |
|-------------------------|
1. | January 2017 684 |
2. | February 1942 -215 |
+-------------------------+
format mdate %tm
list
+--------------------------+
| month year mdate |
|--------------------------|
1. | January 2017 2017m1 |
2. | February 1942 1942m2 |
+--------------------------+
(Declaration of interest: original author of destring.)
See also this thread.
I am working with a dataset that has purchases per date (called ItemNum) on multiple dates across 2800 individuals. Each Item is given its own line, so if an individual has purchased two items on a date, that date will appear twice. I don't care how many items were purchased on a date (with each date representing one trip), but rather the mean number of trips made across the 2800 individuals (For about 18230 lines of data). My data looks like this:
+---+----------+-------+---------------------- ---+
|ID | Date |ItemNum| ItemDescript |
| 1 |01/22/2010| 1 |Description of the item |
| 1 |01/22/2010| 2 |Description of other item |
| 1 |07/19/2013| 1 | |
| 2 |06/04/2012| 1 | |
| 2 |02/02/2013| 1 | |
| 2 |11/13/2013| 1 | |
+---+----------+-------+---------------------- ---+
In the above table, person 1 made two trips and three item purchases (because two dates are shown), person 2 made three trips. I am interested in the average number of trips across all people, but first I need to collapse it down to unique dates. So I know I need to collapse on the date, but when I do
collapse (mean) ItemNum (first) Date, by(ID)
it just takes the first date that the ID shows up, not the first occurrence of each unique date.
The next issue is that once it's collapsed, I need to take the mean of the count of the dates, not the date itself, which is also where I seem to be getting tripped up.
Or perhaps something like
clear
input ID str16 dt ItemNum
1 "01/22/2010" 1
1 "01/22/2010" 2
1 "07/19/2013" 1
end
generate Date = daily(dt,"MDY")
egen trip = tag(ID Date)
collapse (sum) trip, by(ID)
summarize trip
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
trip | 1 2 . 2 2
if what you are looking for is found in "Mean" - a single number giving the average number of trips made by the 2800 individuals (1 individual with the limited sample data given).
are you trying to do the following?
collapse (mean) ItemNum, by(ID Date) fast
I have a dataset that has a date variable with missing dates.
var1
15sep2014
15sep2014
17sep2014
18sep2014
22sep2014
22sep2014
22sep2014
29sep2014
06oct2014
I aggregated the data using this command.
gen week = week(var1)
and the results look like this
var 1 week
15sep2014 37
15sep2014 37
17sep2014 38
18sep2014 38
22sep2014 38
I was wondering whether it would be possible to get the month name and year in the week variable.
In general, week() is part of the solution if and only if you define your weeks according to Stata's rules for weeks. They are
Week 1 of the year starts on January 1, regardless.
Week 2 of the year starts on January 8, regardless.
And so on, except that week 52 of the year includes 8 or 9 days, depending on
whether the year is leap or not.
Do you use these rules? I guess not. Then the simplest practice is to define a week by whichever day starts the week. If your weeks start on Sundays, then use the rule (dailydate - dow(dailydate)). If your weeks start on Mondays, ..., Saturdays, adjust the definition.
. clear
. input str9 svar1
svar1
1. "15sep2014"
2. "15sep2014"
3. "17sep2014"
4. "18sep2014"
5. "22sep2014"
6. "22sep2014"
7. "22sep2014"
8. "29sep2014"
9. "06oct2014"
10. end
. gen var1 = daily(svar1, "DMY")
. gen week = var1 - dow(var1)
. format week var1 %td
. list
+-----------------------------------+
| svar1 var1 week |
|-----------------------------------|
1. | 15sep2014 15sep2014 14sep2014 |
2. | 15sep2014 15sep2014 14sep2014 |
3. | 17sep2014 17sep2014 14sep2014 |
4. | 18sep2014 18sep2014 14sep2014 |
5. | 22sep2014 22sep2014 21sep2014 |
|-----------------------------------|
6. | 22sep2014 22sep2014 21sep2014 |
7. | 22sep2014 22sep2014 21sep2014 |
8. | 29sep2014 29sep2014 28sep2014 |
9. | 06oct2014 06oct2014 05oct2014 |
+-----------------------------------+
Much more discussion here, here and here, although the first should be sufficient.
Instead of using the week() function, I would probably use the wofd() function to transform your %td daily date into a %tw weekly date. Then you can just play with the datetime display formats to decide exactly how to format the date. For example:
gen date_weekly = wofd(var1)
format date_weekly %twww:_Mon_ccYY
That code should give you this:
var1 date_weekly
15sep2014 37: Sep 2014
15sep2014 37: Sep 2014
17sep2014 38: Sep 2014
18sep2014 38: Sep 2014
22sep2014 38: Sep 2014
This help file will be useful:
help datetime display formats
And if you want to brush up on the difference between %tw and %td dates, you might refresh yourself here:
help datetime