Formatting and displaying locals in Stata - stata

I came across a little puzzle with Stata's locals, display, and quotes..
Consider this example:
generate var1 = 54321 in 1
local test: di %10.0gc var1[1]
Why is the call:
di "`test'"
returning
54,321
Whereas the call:
di `test'
shows
54 321
What is causing such behaviour?

Complete the sequence with
(1)
. di 54,321
54 321
(2)
. di "54,231"
54,321
display interprets (1) as an instruction to display two arguments, one by one. You get the same result with your last line as (first) the local macro test was evaluated and (second) display saw the result of the evaluation.
The difference when quotation marks are supplied is that thereby you insist that the argument is a literal string. You get the same result with your first display command for the same reasons as just given.
In short, the use of local macros here is quite incidental to the differences in results. display never sees the local macro as such; it just sees its contents after evaluation. So, what you are seeing pivots entirely on nuances in what is presented to display.
Note further that while you can use a display format in defining the contents of a local macro, that ends that story. A local does not have an attached format that sticks with it. It's just a string (which naturally may mean a string with numeric characters).

Related

Why is the incorrect date displaying in Stata

The local system datetime is 10:34 PM 1/8/2021.
In Stata I write
local datestamp: di %tdCCYY-NN-DD daily("S_DATE","DMY")
display `datestamp'
and the output is 2012
If I write
di %tdCCYY-NN-DD daily("S_DATE","DMY")
I get 2021-01-08
Why the discrepancy? This is puzzling to me. I clearly assigned datestamp yet when I display it obviously something is wrong.
Executive summary: display saw 2021-01-08 and evaluated it as a expression in numbers. 2021 - 1 - 8 = 2012, so 2012 was what you saw.
This is a subtle question, but the answer will show Stata's perfect logic, by its own rules.
The code as posted in the question omits the crucial $ sign before S_DATE, which indicates a global macro, specifically a system macro containing the current daily date, obtained from your operating system.
It is now 9 January 2021 in my time zone, but my example will work as well as yours to show what is going on. You defined a local macro, and then you included a reference to that local macro in a call to display. The display command has a designed inclination to calculate the result of any expression it sees before it displays the result of that calculation.
Taking this more slowly: There are two quite distinct steps to the interpretation of your display command. First, as a matter of interpreting any Stata command line, all references to local and global macros are replaced with the contents of those macros (if they exist; it is not an error to refer to a macro that does not exist, but that is not an issue here). Second, display evaluates any expression it sees and then displays the result of that expression. Despite its name, display is not designed to show you directly any macro that exists, although that is what happens if the result of evaluating it leaves it the same as when it was presented. Thus if a local macro contains the string foo, that is what display will show you -- unless foo is the name of a scalar or variable, in which case the name won't be shown, just the values of that scalar or that variable (in the first observation, in the latter case).
The command to see exactly what is inside a macro, without interpretation or calculation, is macro list.
To the point, consider the different results here. In the first display command, the quotation marks " " are functional, not ornamental, and instruct display to treat its input as a string. Without the quotation marks, display is inclined to treat what it sees as numeric, and here it sees an expression, 2021 MINUS 1 MINUS 9, which evaluates to 2011. The leading zeros are ignored. In your case your date was 2021-01-08 and the result was 2012, as you reported.
. local datestamp: di %tdCCYY-NN-DD daily("$S_DATE","DMY")
. di "`datestamp'"
2021-01-09
. di `datestamp'
2011
You get the right answer with the last statement in your question. You fed display a number but instructed it to use a daily date display format to interpret that number, and you got exactly what you asked for and you expected. 22288 is, or was, 8 January 2021 on scale with origin 0 at 1 January 1960.

Scalar formatting problem with number of digits

I am trying to write a command that ereturns a scalar that is percentage rounded to 2 decimal places. The percentage can be negative or positive, with unknown number of digits before the decimal point.
Here's MRE that shows the problem I am having.
#delimit;
capture program drop my_note;
program my_note, eclass;
local my_x: display %-9.2f 92.23999999999999;
ereturn scalar my_x = `my_x';
end;
ereturn clear;
my_note;
ereturn list;
display %-9.2f 92.23999999999999;
display 92.23999999999999;
I am puzzled why display seems to do the right thing (turn 92.23999999999999 to 92.24, though regardless of format), but that e(my_x) does not seem to inherit that format.
When you create your scalar you copy the value of the local `my_x'. That value is still 92.23999999999999 as : display is not changing the underlying data, only how it is displayed. Think of it as the data in 5.00e+2 and 500 is the same, it is just how that value is shown that differs.
You need to use strings to work with how the value is displayed. However, there are two issues with strings in your code example.
While scalars normally can hold both strings and numeric values, returned scalars can not hold strings (don't ask me why). Would it be possible to return a local instead?
In %-9.2f you specify that the display format to be 9 characters long so your scalar will be e(my_x) : "92.24 ". You can adjust %-9.2f, but since you are now working with strings you can remove excessive spaces using the trim() function.
Try the code below and see if that works given the context of this function. If not tell us more about what you are about to do.
#delimit;
capture program drop my_note;
program my_note, eclass;
local my_x: display %-9.2f 92.23999999999999;
ereturn local my_x = trim("`my_x'");
end;
ereturn clear;
my_note;
ereturn list;

ERROR: P does not have a numeric suffix (SAS, RENAME)

After having worked out a bunch of other errors I'm left with the following
ERROR: P does not have a numeric suffix.
From all the info I've been able to find this happens a lot when using PROC TRANSPOSE, however I'm not using that here (and don't anywhere else in this code).
Data Spillover_HE (rename=(F1=FY F2=BN F3=employeeID F4=grade_subject_ID
F5=AsmtID_agg F6=linkB F7=subgroupID F8=w F9=MGP_SE F10=Residual_SE
F11=Residual_Var F12=mgp_var F13=student_n F14=calcID F15=sumwt F16=MGP
F17=ave_prescore F18=p_imp F19=p_postImp F20=p_sped F21=p_sped_rs
F22=p_sped_se_ss F23=p_sped_st F24=p_sped_tt F25=P-ell F26=p_ed
F27=p_hispanic F28=p_black F29=p_white F30=p_asian F31=p_other
F32=p_blahispmale F33=p_overaundcred F34=p_retained F35=p_transfer
F36=p_top10 F37=p_top5 F38=p_top1 F39=p_bot10 F40=p_bot5 F41=p_bot1
F42=target_population F43=mean_residual_var F44=P_0_5)); run;
Obviously I have a bunch of variables that start with "p". None of them are underlined in the log. I'm using SAS Base, and got the same error in SAS Enterprise Guide.
Not sure what my next move should be. Thanks.
A dash is not a correct character in a variable name.
Replace F25=P-ell into F25=P_ell.
You can use dash to specify a range of variables e.g. rename=(x1-x100=y1-y100). This code renames 100 variables with prefix x to y.

How do I loop over part of a variable name?

I need to use a local macro to loop over part of a variable name in Stata.
Here is what I tried to do:
local phth mep mibp mbp
tab lod_`phth'_BL
Stata will not recognize the entire variable name.
variable lod_mep not found
r(111);
If I remove the underscore after the `phth' it still does not recognize anything after the macro name.
I want to avoid using a complicated foreach loop.
Is there any way this can be done just using the simple macro?
Thanks!
Your request is a bit confusing. First, this is precisely the purpose of a loop, and second, loops in Stata are (at the "introductory level") quite simple. The following example is a bit nonsensical (and given the structure, there are easier ways of going about this), but should convey the basic idea.
// set up a similar variable name structure
sysuse auto , clear
rename (price mpg weight length) ///
(pref_base1_suff pref_base2_suff pref_base3_suff pref_base4_suff)
// define a local macro to hold the elements to loop over
local varbases = "base1 base2 base3 base4"
// refer to the items of the local macro in a loop
foreach b of local varbases {
summ pref_`b'_suff
}
See help foreach for the syntax of foreach. In particular, note that the structure employed above may not even be required due to Stata's varlist structure (see help varlist). For example, continuing with the code above:
foreach v of varlist pref_base?_suff {
summ `v'
}
The wildcard ? takes the place of one character. * could be used for more flexibility. However, if your variables are not as easily identifiable using the pattern matching allowed by varlist, a loop as in the first example is simple enough -- four very short lines of code.
Postscript
Upon further reflection (sometimes the structure of the question anchors a certain method, when an alternative approach is more straightforward), searching the help files for information on the tabulate command (help tabulate) will direct you to the following syntax: tab1 varlist [if] [in] [weight] [, tab1_options]
Given the discussion above about the use of varlists, you can simply code
tab1 lod_m*_BL
assuming, of course, that there are no other variables matching the pattern for which you do not want to report a frequency table. Alternatively,
tab1 lod_mep_BL lod_mibp_BL lod_mbp_BL
is not much longer and does the trick, albeit without the use of any sort of wildcard or macro substitution.

Display and concatenate formatted numbers as a string

I'd like to have some pretty output that combines several local macros that have been formatted and put into a single line
I've tried multiple different configurations, but here's basically what I'd like
loc number1 = 12.20645161
loc number2 = 52.81451247
di "something here"
Desired Output: "number 1 is 12.2065 and number 2 is 52.8145"
I can format a single local macro:
di %12.4f `number1'
And I can concatenate two unformatted macros:
di "number 1 is `number1' and number 2 is `number1'"
But can't seem to do both simultaneously. Is there a way to format the macro early or do some inline formatting or append formatted strings to each other?
You are much of the way there, but there are some misconceptions here too.
You can't format a local macro in the sense that you can assign a format to a local macro. What you are doing is telling display to use a format in showing the value of that macro, but the macro itself is unaffected and the format doesn't stick. In fact the macro and the format are never associated in any strict sense; it's entirely a matter of the display command putting your instructions together, what to show and precisely how to show it.
This is not fundamentally different from similar commands in many other languages.
One solution is
loc number1 12.20645161
loc number2 52.81451247
di "number 1 is " %5.4f `number1' "and number 2 is " %5.4f `number2'
Note that omitting the = signs assigns the numbers as string equivalents; there is no conversion to binary and back to decimal that way. The difference would not bite in this example.
Further notes:
Avoid round() here like the plague. The solution to formatting problems is a format, not a numeric operation. It will work much of the time, but it's not guaranteed. It won't guarantee exactly what you want always because almost all decimal numbers cannot be held exactly as binaries and that will bite sometimes.
You can do this
local nice1 : di %5.4f `number1'
local nice2 : di %5.4f `number2'
di "number 1 is `nice1' and number 2 is `nice2'"
That doesn't assign a format either, but it is the string manipulation you seek.
The way to think of it is: Macros hold strings. When you want to manipulate strings as strings, use string operations only.
Ok, so the only way I've found to edit macros is to do some formatting ahead of time. So instead of apply a format %4.2f to the macro in line, you can call round() on the variable to the .01 decimal point and save it. Then it will appear correctly as part of display. You can however, format variable elements inline, but don't need any concatenation symbols (+ or &) to do so.
sysuse auto
// get price difference
loc p_inc = (price[2] - price[1]) / price[2] * 100
// preformat local macro
loc p_inc = round(`p_inc',.01)
// format variables inline
di "price of car 2 (" %-5.0fc price[2] ") is `p_inc '% bigger than price of car 1 (" %-5.0fc price[1] ")"
Output: price of car 2 (4,749) is 13.69% bigger than price of car 1 (4,099)