Looping over dates - stata

I'm trying to loop over dates in Stata.
I have an issue as I believe that my string variable is recognized as a date type.
For instance,
forvalues day = 1/31 {
if `day' < 10 {
local file_date ="2017-07-0`day'"
di `file_date'
}
else {
local file_date ="2017-07-`day'"
di `file_date'
}
*insert operation here
}
is printing 2009, 2008, 2007, etc.
even though the results should be 2017-07-01, 2017-07-02, etc.
Does anyone have a clue why this is happening?
By the way,
forvalues day=1/31 {
if `day' < 10 {
local file_date ="2017070`day'"
di `file_date'
}
else {
local file_date ="201707`day'"
di `file_date'
}
*insert operation here
}
works fine, but I want the hyphens in the variable.

Some minor confusions can be cleared out of the way first:
There are no string variables here in Stata's sense, just local macros.
Stata has no variable type that is a date type. Stata does have ways of handling dates, naturally, but no dedicated date types.
The key point is what happens when you type a command that includes references to local macros (or for that matter, global macros; none here, but the principle is the same).
All macro references are replaced by the contents of the macros.
Then Stata executes the command as it stands (to the best of its ability; clearly, it must be legal for that to work).
The first time around your loop, the local macro reference is interpreted, so the first di (display) command now reads
di 2017-07-01
You're inclined to see that as a date, but display cannot read your mind. it sees an expression to be evaluated; that's part of its job to act as a calculator and then to display the results. Thus it sees no hyphens, but minus signs (and leading zeros are always allowed in numbers just as 0.1 is always allowed as well as .1). So, it is evaluated as 2017 minus 7 minus 1 -- and why you see 2009 should now be clear.
The solution is simple: use " " to indicate to display that you think of the characters as a literal string to be displayed as it comes.
Here is how I would rewrite your code:
forvalues day = 1/31 {
local Day : di %02.0f `day'
local file_date "2017-07-`Day'"
di "`file_date'"
*insert operation here
}
See this paper for the cleaner way to loop 01, 02, ..., 09, 10, ... 31.

Related

Why is the incorrect date displaying in Stata

The local system datetime is 10:34 PM 1/8/2021.
In Stata I write
local datestamp: di %tdCCYY-NN-DD daily("S_DATE","DMY")
display `datestamp'
and the output is 2012
If I write
di %tdCCYY-NN-DD daily("S_DATE","DMY")
I get 2021-01-08
Why the discrepancy? This is puzzling to me. I clearly assigned datestamp yet when I display it obviously something is wrong.
Executive summary: display saw 2021-01-08 and evaluated it as a expression in numbers. 2021 - 1 - 8 = 2012, so 2012 was what you saw.
This is a subtle question, but the answer will show Stata's perfect logic, by its own rules.
The code as posted in the question omits the crucial $ sign before S_DATE, which indicates a global macro, specifically a system macro containing the current daily date, obtained from your operating system.
It is now 9 January 2021 in my time zone, but my example will work as well as yours to show what is going on. You defined a local macro, and then you included a reference to that local macro in a call to display. The display command has a designed inclination to calculate the result of any expression it sees before it displays the result of that calculation.
Taking this more slowly: There are two quite distinct steps to the interpretation of your display command. First, as a matter of interpreting any Stata command line, all references to local and global macros are replaced with the contents of those macros (if they exist; it is not an error to refer to a macro that does not exist, but that is not an issue here). Second, display evaluates any expression it sees and then displays the result of that expression. Despite its name, display is not designed to show you directly any macro that exists, although that is what happens if the result of evaluating it leaves it the same as when it was presented. Thus if a local macro contains the string foo, that is what display will show you -- unless foo is the name of a scalar or variable, in which case the name won't be shown, just the values of that scalar or that variable (in the first observation, in the latter case).
The command to see exactly what is inside a macro, without interpretation or calculation, is macro list.
To the point, consider the different results here. In the first display command, the quotation marks " " are functional, not ornamental, and instruct display to treat its input as a string. Without the quotation marks, display is inclined to treat what it sees as numeric, and here it sees an expression, 2021 MINUS 1 MINUS 9, which evaluates to 2011. The leading zeros are ignored. In your case your date was 2021-01-08 and the result was 2012, as you reported.
. local datestamp: di %tdCCYY-NN-DD daily("$S_DATE","DMY")
. di "`datestamp'"
2021-01-09
. di `datestamp'
2011
You get the right answer with the last statement in your question. You fed display a number but instructed it to use a daily date display format to interpret that number, and you got exactly what you asked for and you expected. 22288 is, or was, 8 January 2021 on scale with origin 0 at 1 January 1960.

Drop variables if mean is 0 for loop

I want to drop all variables that have a mean of 0. The code I'm using is
foreach var of varlist _all {
drop 'var' if mean 'var'==0
}
and I'm getting the error message mean not found.
How can I get around this?
There are several reasons why that won't work. First, consider this suggested solution:
foreach var of varlist _all {
su `var', meanonly
if r(mean) == 0 drop `var'
}
This will work with string variables too, as the request to summarize a string variable isn't illegal, and the mean will be returned as missing.
What's wrong with your code?
Problem 1. The sequence
mean `var' == 0
is just fantasy syntax. There isn't a mean function that you can apply in this context and if there were, the syntax would be different.
Problem 2. You can drop observations using an if qualifier or you can drop variables but you can't mix syntaxes. It's hard even to know what the mix would mean, but it's illegal any way. The deeper problem here is confusing the if command and the if qualifier. See also the help for drop.
Problem 3. As typed here you have used matching quotation marks for local macro references. It's possible to guess that you really used left and right quotation marks as otherwise you would have got a different error message. Nevertheless, your code as typed would not work for that reason also.
A wider comment is a reminder that a mean of zero doesn't imply that all values of zero. If you wanted just to drop variables with all values zero, then findname (Stata Journal) allows that
findname, all(# == 0)
drop `r(varlist)'
and there are extensions to allow missing values too.

Stata does not replace variable value

Stata does not replace a value, as I am commanding. What is happening?
I have this variable Shutouts, which is a float variable (%9.0g).
One observation has the value = 5.08; that is an error, it should be 5.
I type: replace Shutout= 5 if Shutout==5.08.
And, surprisingly to me, Stata responds:
replace Shutouts=5 if Shutouts==5.08
(0 real changes made)
I have a similar problem for a variable with the same characteristics, with the name Save_perc; one value is 9.2 but should be .92. And, also this time, I receive this response from Stata:
replace Save_perc=.92 if Save_perc==9.2
(0 real changes made)
Why "0 real changes"?
It seems like a very banal problem, but I have been working on it for like 30' and I cannot really figure it out.
it has to do with how floating numbers are stored into memory. You should not use == when comparing two different number formats because some internal storage approximation can make the comparison fail.
In your case, you should just use
Shutouts=5 if Shutouts > 5.07
or
Shutouts=5 if Shutouts == float(5.07)

Destring a time variable using Stata

How to destring a time variable (7:00) using Stata?
I have tried destring: however, the : prevents the destring. I then tried destring, ignore(:) but was unable to then make a double and/or format %tc. encode does not work; recast does not do the job.
I also have a separate string date that I was able to destring and convert to a double.
Am I missing that I could be combining these two string variables (one date, one time) into a date/time variable or is it correct to destring them individually and then combine them into a date/time variable?
Short answer
To give the bottom line first: two string variables that hold date and time information can be converted to a single numeric date-time variable using some operation like
generate double datetime = clock(date + time, "DMY hm")
format datetime %tc
except that the exact details will depend on exactly how your dates are held.
For understanding dates and times in Stata there is no substitute for
help dates and times
Everything else tried is likely to be wrong or irrelevant or both, as your experience shows.
Longer answer, addressing misconceptions
destring, encode and recast are all (almost always) completely wrong in Stata for converting string dates and/or times to numeric dates and/or times. (I can think of one exception: if somehow a date in years had been imported as string with values "1960", "1961", etc. then destring would be quite all right.)
In reverse order,
recast is not for any kind of numeric to string or string to numeric conversion. It only recasts among numeric or among string types.
encode is essentially for mapping obvious strings to numeric and (unless you specify otherwise) will produce integer values 1, 2, 3, and so forth which will be quite wrong for times or dates in general.
destring as you applied it implies that the string times "7:00", "7:59", "8:00" should be numeric, except that someone stupidly added irrelevant punctuation. But if you strip the colons :, you get times 700, 759, 800, etc. which will not match the standard properties of times. For example, the difference between "8:00" and "7:59" is clearly one minute, but removing the informative punctuation would just yield numbers 800 and 759, which differ by 41, which makes no sense.
For a pure time, you can set up your own system, or use Stata's date-time functions.
For a time between "00:00" and "23:59" you can use Stata's date-times:
. di %tc clock("7:00", "hm")
01jan1960 07:00:00
. di %tc_HH:MM clock("7:00", "hm")
07:00
With variables you would need to generate a new variable and make sure that it is created as double.
A pure time less than 24 hours is (notionally) a time on 1 January 1960, but you can ignore that. But you need to hold in mind (constantly!) that the underlying numeric units are milliseconds. Only the format gives you a time in conventional terms.
If you have times more than 24 hours, that is probably not a good idea.
Your own system could just be to convert string times in the form "hh:mm" to minutes and do calculations in those terms. For times held as variables, the easiest way forward would be to use split, destring to produce numeric variables holding hours and minutes and then use 60 * hours + minutes.
However, despite your title, the real problem here seems to be dealing jointly with date and time information, not just time information, so at this point, you might like to read the short answer again.

Formatting and displaying locals in Stata

I came across a little puzzle with Stata's locals, display, and quotes..
Consider this example:
generate var1 = 54321 in 1
local test: di %10.0gc var1[1]
Why is the call:
di "`test'"
returning
54,321
Whereas the call:
di `test'
shows
54 321
What is causing such behaviour?
Complete the sequence with
(1)
. di 54,321
54 321
(2)
. di "54,231"
54,321
display interprets (1) as an instruction to display two arguments, one by one. You get the same result with your last line as (first) the local macro test was evaluated and (second) display saw the result of the evaluation.
The difference when quotation marks are supplied is that thereby you insist that the argument is a literal string. You get the same result with your first display command for the same reasons as just given.
In short, the use of local macros here is quite incidental to the differences in results. display never sees the local macro as such; it just sees its contents after evaluation. So, what you are seeing pivots entirely on nuances in what is presented to display.
Note further that while you can use a display format in defining the contents of a local macro, that ends that story. A local does not have an attached format that sticks with it. It's just a string (which naturally may mean a string with numeric characters).