Display and concatenate formatted numbers as a string - stata

I'd like to have some pretty output that combines several local macros that have been formatted and put into a single line
I've tried multiple different configurations, but here's basically what I'd like
loc number1 = 12.20645161
loc number2 = 52.81451247
di "something here"
Desired Output: "number 1 is 12.2065 and number 2 is 52.8145"
I can format a single local macro:
di %12.4f `number1'
And I can concatenate two unformatted macros:
di "number 1 is `number1' and number 2 is `number1'"
But can't seem to do both simultaneously. Is there a way to format the macro early or do some inline formatting or append formatted strings to each other?

You are much of the way there, but there are some misconceptions here too.
You can't format a local macro in the sense that you can assign a format to a local macro. What you are doing is telling display to use a format in showing the value of that macro, but the macro itself is unaffected and the format doesn't stick. In fact the macro and the format are never associated in any strict sense; it's entirely a matter of the display command putting your instructions together, what to show and precisely how to show it.
This is not fundamentally different from similar commands in many other languages.
One solution is
loc number1 12.20645161
loc number2 52.81451247
di "number 1 is " %5.4f `number1' "and number 2 is " %5.4f `number2'
Note that omitting the = signs assigns the numbers as string equivalents; there is no conversion to binary and back to decimal that way. The difference would not bite in this example.
Further notes:
Avoid round() here like the plague. The solution to formatting problems is a format, not a numeric operation. It will work much of the time, but it's not guaranteed. It won't guarantee exactly what you want always because almost all decimal numbers cannot be held exactly as binaries and that will bite sometimes.
You can do this
local nice1 : di %5.4f `number1'
local nice2 : di %5.4f `number2'
di "number 1 is `nice1' and number 2 is `nice2'"
That doesn't assign a format either, but it is the string manipulation you seek.
The way to think of it is: Macros hold strings. When you want to manipulate strings as strings, use string operations only.

Ok, so the only way I've found to edit macros is to do some formatting ahead of time. So instead of apply a format %4.2f to the macro in line, you can call round() on the variable to the .01 decimal point and save it. Then it will appear correctly as part of display. You can however, format variable elements inline, but don't need any concatenation symbols (+ or &) to do so.
sysuse auto
// get price difference
loc p_inc = (price[2] - price[1]) / price[2] * 100
// preformat local macro
loc p_inc = round(`p_inc',.01)
// format variables inline
di "price of car 2 (" %-5.0fc price[2] ") is `p_inc '% bigger than price of car 1 (" %-5.0fc price[1] ")"
Output: price of car 2 (4,749) is 13.69% bigger than price of car 1 (4,099)

Related

Controlling newlines when writing out arrays in Fortran

So I have some code that does essentially this:
REAL, DIMENSION(31) :: month_data
INTEGER :: no_days
no_days = get_no_days()
month_data = [fill array with some values]
WRITE(1000,*) (month_data(d), d=1,no_days)
So I have an array with values for each month, in a loop I fill the array with a certain number of values based on how many days there are in that month, then write out the results into a file.
It took me quite some time to wrap my head around the whole 'write out an array in one go' aspect of WRITE, but this seems to work.
However this way, it writes out the numbers in the array like this (example for January, so 31 values):
0.00000 10.0000 20.0000 30.0000 40.0000 50.0000 60.0000
70.0000 80.0000 90.0000 100.000 110.000 120.000 130.000
140.000 150.000 160.000 170.000 180.000 190.000 200.000
210.000 220.000 230.000 240.000 250.000 260.000 270.000
280.000 290.000 300.000
So it prefixes a lot of spaces (presumably to make columns line up even when there are larger values in the array), and it wraps lines to make it not exceed a certain width (I think 128 chars? not sure).
I don't really mind the extra spaces (although they inflate my file sizes considerably, so it would be nice to fix that too...) but the breaking-up-lines screws up my other tooling. I've tried reading several Fortran manuals, but while some of the mention 'output formatting', I have yet to find one that mentions newlines or columns.
So, how do I control how arrays are written out when using the syntax above in Fortran?
(also, while we're at it, how do I control the nr of decimal digits? I know these are all integer values so I'd like to leave out any decimals all together, but I can't change the data type to INTEGER in my code because of reasons).
You probably want something similar to
WRITE(1000,'(31(F6.0,1X))') (month_data(d), d=1,no_days)
Explanation:
The use of * as the format specification is called list directed I/O: it is easy to code, but you are giving away all control over the format to the processor. In order to control the format you need to provide explicit formatting, via a label to a FORMAT statement or via a character variable.
Use the F edit descriptor for real variables in decimal form. Their syntax is Fw.d, where w is the width of the field and d is the number of decimal places, including the decimal sign. F6.0 therefore means a field of 6 characters of width with no decimal places.
Spaces can be added with the X control edit descriptor.
Repetitions of edit descriptors can be indicated with the number of repetitions before a symbol.
Groups can be created with (...), and they can be repeated if preceded by a number of repetitions.
No more items are printed beyond the last provided variable, even if the format specifies how to print more items than the ones actually provided - so you can ask for 31 repetitions even if for some months you will only print data for 30 or 28 days.
Besides,
New lines could be added with the / control edit descriptor; e.g., if you wanted to print the data with 10 values per row, you could do
WRITE(1000,'(4(10(F6.0,:,1X),/))') (month_data(d), d=1,no_days)
Note the : control edit descriptor in this second example: it indicates that, if there are no more items to print, nothing else should be printed - not even spaces corresponding to control edit descriptors such as X or /. While it could have been used in the previous example, it is more relevant here, in order to ensure that, if no_days is a multiple of 10, there isn't an empty line after the 3 rows of data.
If you want to completely remove the decimal symbol, you would need to rather print the nearest integers using the nint intrinsic and the Iw (integer) descriptor:
WRITE(1000,'(31(I6,1X))') (nint(month_data(d)), d=1,no_days)

How to extract number and/or string in sas

Generic drug names are sometimes formatted like this:
X-Y Tab 5-325 MG, where the drug of interest is X and the amount is 5. A different column has pre-extracted this as: 5 MG-325 M or 10 MG-325 depending on the amount of X.
Is there a way I can extract the amount that is associated with a MG? I am not sure if it's possible to use IS LIKE since the column is character format, and then converting the amount since there is a space between the number and MG.
the examples you are providing suggest that the pre-extracted string is a number space MG. If that's always true you can use the scan function to get the number part of your string.
amount = scan(pre-extracted,1,' ');
you can do this in a data step. Look up the scan function and you can see other options that may help you customize this further.

Stata does not replace variable value

Stata does not replace a value, as I am commanding. What is happening?
I have this variable Shutouts, which is a float variable (%9.0g).
One observation has the value = 5.08; that is an error, it should be 5.
I type: replace Shutout= 5 if Shutout==5.08.
And, surprisingly to me, Stata responds:
replace Shutouts=5 if Shutouts==5.08
(0 real changes made)
I have a similar problem for a variable with the same characteristics, with the name Save_perc; one value is 9.2 but should be .92. And, also this time, I receive this response from Stata:
replace Save_perc=.92 if Save_perc==9.2
(0 real changes made)
Why "0 real changes"?
It seems like a very banal problem, but I have been working on it for like 30' and I cannot really figure it out.
it has to do with how floating numbers are stored into memory. You should not use == when comparing two different number formats because some internal storage approximation can make the comparison fail.
In your case, you should just use
Shutouts=5 if Shutouts > 5.07
or
Shutouts=5 if Shutouts == float(5.07)

Destring a time variable using Stata

How to destring a time variable (7:00) using Stata?
I have tried destring: however, the : prevents the destring. I then tried destring, ignore(:) but was unable to then make a double and/or format %tc. encode does not work; recast does not do the job.
I also have a separate string date that I was able to destring and convert to a double.
Am I missing that I could be combining these two string variables (one date, one time) into a date/time variable or is it correct to destring them individually and then combine them into a date/time variable?
Short answer
To give the bottom line first: two string variables that hold date and time information can be converted to a single numeric date-time variable using some operation like
generate double datetime = clock(date + time, "DMY hm")
format datetime %tc
except that the exact details will depend on exactly how your dates are held.
For understanding dates and times in Stata there is no substitute for
help dates and times
Everything else tried is likely to be wrong or irrelevant or both, as your experience shows.
Longer answer, addressing misconceptions
destring, encode and recast are all (almost always) completely wrong in Stata for converting string dates and/or times to numeric dates and/or times. (I can think of one exception: if somehow a date in years had been imported as string with values "1960", "1961", etc. then destring would be quite all right.)
In reverse order,
recast is not for any kind of numeric to string or string to numeric conversion. It only recasts among numeric or among string types.
encode is essentially for mapping obvious strings to numeric and (unless you specify otherwise) will produce integer values 1, 2, 3, and so forth which will be quite wrong for times or dates in general.
destring as you applied it implies that the string times "7:00", "7:59", "8:00" should be numeric, except that someone stupidly added irrelevant punctuation. But if you strip the colons :, you get times 700, 759, 800, etc. which will not match the standard properties of times. For example, the difference between "8:00" and "7:59" is clearly one minute, but removing the informative punctuation would just yield numbers 800 and 759, which differ by 41, which makes no sense.
For a pure time, you can set up your own system, or use Stata's date-time functions.
For a time between "00:00" and "23:59" you can use Stata's date-times:
. di %tc clock("7:00", "hm")
01jan1960 07:00:00
. di %tc_HH:MM clock("7:00", "hm")
07:00
With variables you would need to generate a new variable and make sure that it is created as double.
A pure time less than 24 hours is (notionally) a time on 1 January 1960, but you can ignore that. But you need to hold in mind (constantly!) that the underlying numeric units are milliseconds. Only the format gives you a time in conventional terms.
If you have times more than 24 hours, that is probably not a good idea.
Your own system could just be to convert string times in the form "hh:mm" to minutes and do calculations in those terms. For times held as variables, the easiest way forward would be to use split, destring to produce numeric variables holding hours and minutes and then use 60 * hours + minutes.
However, despite your title, the real problem here seems to be dealing jointly with date and time information, not just time information, so at this point, you might like to read the short answer again.

Formatting and displaying locals in Stata

I came across a little puzzle with Stata's locals, display, and quotes..
Consider this example:
generate var1 = 54321 in 1
local test: di %10.0gc var1[1]
Why is the call:
di "`test'"
returning
54,321
Whereas the call:
di `test'
shows
54 321
What is causing such behaviour?
Complete the sequence with
(1)
. di 54,321
54 321
(2)
. di "54,231"
54,321
display interprets (1) as an instruction to display two arguments, one by one. You get the same result with your last line as (first) the local macro test was evaluated and (second) display saw the result of the evaluation.
The difference when quotation marks are supplied is that thereby you insist that the argument is a literal string. You get the same result with your first display command for the same reasons as just given.
In short, the use of local macros here is quite incidental to the differences in results. display never sees the local macro as such; it just sees its contents after evaluation. So, what you are seeing pivots entirely on nuances in what is presented to display.
Note further that while you can use a display format in defining the contents of a local macro, that ends that story. A local does not have an attached format that sticks with it. It's just a string (which naturally may mean a string with numeric characters).