Stata does not replace variable value - replace

Stata does not replace a value, as I am commanding. What is happening?
I have this variable Shutouts, which is a float variable (%9.0g).
One observation has the value = 5.08; that is an error, it should be 5.
I type: replace Shutout= 5 if Shutout==5.08.
And, surprisingly to me, Stata responds:
replace Shutouts=5 if Shutouts==5.08
(0 real changes made)
I have a similar problem for a variable with the same characteristics, with the name Save_perc; one value is 9.2 but should be .92. And, also this time, I receive this response from Stata:
replace Save_perc=.92 if Save_perc==9.2
(0 real changes made)
Why "0 real changes"?
It seems like a very banal problem, but I have been working on it for like 30' and I cannot really figure it out.

it has to do with how floating numbers are stored into memory. You should not use == when comparing two different number formats because some internal storage approximation can make the comparison fail.
In your case, you should just use
Shutouts=5 if Shutouts > 5.07
or
Shutouts=5 if Shutouts == float(5.07)

Related

Scalar formatting problem with number of digits

I am trying to write a command that ereturns a scalar that is percentage rounded to 2 decimal places. The percentage can be negative or positive, with unknown number of digits before the decimal point.
Here's MRE that shows the problem I am having.
#delimit;
capture program drop my_note;
program my_note, eclass;
local my_x: display %-9.2f 92.23999999999999;
ereturn scalar my_x = `my_x';
end;
ereturn clear;
my_note;
ereturn list;
display %-9.2f 92.23999999999999;
display 92.23999999999999;
I am puzzled why display seems to do the right thing (turn 92.23999999999999 to 92.24, though regardless of format), but that e(my_x) does not seem to inherit that format.
When you create your scalar you copy the value of the local `my_x'. That value is still 92.23999999999999 as : display is not changing the underlying data, only how it is displayed. Think of it as the data in 5.00e+2 and 500 is the same, it is just how that value is shown that differs.
You need to use strings to work with how the value is displayed. However, there are two issues with strings in your code example.
While scalars normally can hold both strings and numeric values, returned scalars can not hold strings (don't ask me why). Would it be possible to return a local instead?
In %-9.2f you specify that the display format to be 9 characters long so your scalar will be e(my_x) : "92.24 ". You can adjust %-9.2f, but since you are now working with strings you can remove excessive spaces using the trim() function.
Try the code below and see if that works given the context of this function. If not tell us more about what you are about to do.
#delimit;
capture program drop my_note;
program my_note, eclass;
local my_x: display %-9.2f 92.23999999999999;
ereturn local my_x = trim("`my_x'");
end;
ereturn clear;
my_note;
ereturn list;

Airtable If-statement outputting NaN

I'm using an If-statement to assign integers to strings from another cell. This seems to be working, but if I reference these columns, I'm getting a NaN value. This is my formula below. I tried adding INT() around the output values, but that seemed to break everything. Am I missing something?
IF(FIND('1',{Functional response}),-4,
IF(FIND('2',{Functional response}),-2,
IF(FIND('3',{Functional response}),0,
IF(FIND('4',{Functional response}),2,
IF(FIND('5',{Functional response}),4,"")))))
Assuming Functional response can only store a number 1 to 5 as a string a simple option in excel would be to first convert the string to a number and then use the choose function to assign a value. this works as the numbers are are sequential integers. Assuming Cell K2 has the value of Functional response, your formula could be:
=CHOOSE(--K2,-4,-2,0,2,4)
=CHOOSE(K2+0,-4,-2,0,2,4)
=CHOOSE(K2-0,-4,-2,0,2,4)
=CHOOSE(K2*1,-4,-2,0,2,4)
=CHOOSE(K2/1,-4,-2,0,2,4)
Basically sending the string of a pure number through a math operation has excel convert it to a number. By sending it through a math operation that does not change its value, you get the string as a number.
CHOOSE is like a sequential IF function Supply it with an integer as the first argument and then it will return the value from the subsequent list that matches the number. if the number you supply is greater than the number of options you will get an error.
Alternatively you could just do a straight math convertion on the number stored as a string in K2 using the following formula:
=(K2-3)*2
And as my final option, you could build a table and use VLOOKUP or INDEX/MATCH.
NOTE: If B2:B6 was stored as strings instead of numbers, K2 instead of --K2 would need to be used.

Best way to show blank cell if value if zero

=COUNTIFS(Orders!$T:$T,$B4)
is a code that gives 0 or a +ve result
I use this across 1500 cells which makes the sheet gets filled with 0s
I'd like to remove the Zeros by using the following formula
if(COUNTIFS(Orders!$T:$T,$B3,Orders!$F:$F,""&P$1&"*")=0,
"",
COUNTIFS(Orders!$T:$T,$B3,Orders!$F:$F,""&P$1&"*"))
This calculates every formula twice and increases the calculation time.
How can we do this in 1 formula where if the value is 0 - keep empty - otherwise display the answer
I suggest this cell-function:
=IFERROR(1/(1/COUNTIFS(Orders!$T:$T,$B4)))
EDIT:
I'm not sure what to add as explanation. Basically to replace the result of a complex calculation with blank cells if it results in 0, you can wrap the complex function in
IFERROR(1/(1/ ComplexFunction() ))
It works by twice taking the inverse (1/X) of the result, thus returning the original result in all cases except 0 where a DIV0 error is generated. This error is then caught by IFERROR to result in a blank cell.
The advantage of this method is that it doesn't need to calculate the complex function twice, so can give a significant speed/readability increase, and doesn't fool the output like a custom number format which can be important if this cell is used in further functions.
You only need to set the number format for your range of cells.
Go to the menu Format-->Number-->More Formats-->Custom Number Format...
In the entry area at the top, enter the following: #;-#;""
The "format" of the format string is
(positive value format) ; (negative value format) ; (zero value format)
You can apply colors or commas or anything else. See this link for details
instead of your =COUNTIFS(Orders!$T:$T,$B4) use:
=REGEXREPLACE(""&COUNTIFS(Orders!$T:$T,$B4), "^0$", )
also, to speed up things you should avoid "per row formulae" and use ArrayFormulas

Output is '*' when writing a real to a string

I have a real CURRENTTIME I want to convert to a string named TIMEDIR. As TIMEDIR has to change size it is allocatable. As far as I could find out, the allocation works fine. Also, I checked that CURRENTTIME has a value.
ALLOCATE(CHARACTER(LEN=1)::TIMEDIR)
WRITE(TIMEDIR, '(F1.0)') CURRENTTIME
But
WRITE(*,*) TIMEDIR
outputs *, where it should be 0 (CURRENTTIME is 0.0000000). I have no clue what the problem is.
You're writing the output as a floating point number. Floating point numbers always have a decimal point or an exponent to differentiate them from integers. Thus the narrowest output of a float possible is 0., i.e. 2 characters, and a format of F1.0 will always result in a "*" being printed as the field width is insufficient for what is being written.
Ian Bush's answer says what you need to know: output for a real value using the F edit descriptor requires a field width of at least 2. I'll elaborate a bit on some other aspects.
You mention
As TIMEDIR has to change size it is allocatable
but in the code fragment we see
WRITE(TIMEDIR, '(F1.0)') CURRENTTIME
This suggests a little misunderstanding. [It may be that there's no confusion, but I'll labour the point for the benefit of any other reader coming to the question.]
When an output format looks like Fw.d for w greater than zero the width of the output field is always w. This "always" means: whatever the value of the corresponding variable, the effect of the write statement above on TIMEDIR is to have a single character non-blank.
Now, as in that other answer, 2 is the minimum field width for output of a real value[1]. As with all other numeric output formatting, if the field isn't wide enough for the corresponding value the field consists entirely of *s. F1.0 will always result in output *. If you want output 0. (or 0,)[2] you'll need F2.0.
Coming back to the "varying size of TIMEDIR", output format F2.0 is (possibly) sufficient for non-negative values of CURRENTTIME less than 10, but for negative values or values not less than 10 it isn't. It may well be that this is where F0.d comes in. It's only with this form of the F edit descriptor that the width of the field depends on the output value. That's probably an answer to another question, though.
Finally, as you mention
I have to find out how to make "0" out of "0."
I'll point out that you're looking at having to do some additional logic, such as mentioned elsewhere.
1 And 2 may not be sufficient, even for a zero value: print '(SP,F2.0)', 0.
2 The choice of 0. and 0, depends on the decimal mode: print '(DC,F2.0,DP,F2.0)', 0., 0.

Destring a time variable using Stata

How to destring a time variable (7:00) using Stata?
I have tried destring: however, the : prevents the destring. I then tried destring, ignore(:) but was unable to then make a double and/or format %tc. encode does not work; recast does not do the job.
I also have a separate string date that I was able to destring and convert to a double.
Am I missing that I could be combining these two string variables (one date, one time) into a date/time variable or is it correct to destring them individually and then combine them into a date/time variable?
Short answer
To give the bottom line first: two string variables that hold date and time information can be converted to a single numeric date-time variable using some operation like
generate double datetime = clock(date + time, "DMY hm")
format datetime %tc
except that the exact details will depend on exactly how your dates are held.
For understanding dates and times in Stata there is no substitute for
help dates and times
Everything else tried is likely to be wrong or irrelevant or both, as your experience shows.
Longer answer, addressing misconceptions
destring, encode and recast are all (almost always) completely wrong in Stata for converting string dates and/or times to numeric dates and/or times. (I can think of one exception: if somehow a date in years had been imported as string with values "1960", "1961", etc. then destring would be quite all right.)
In reverse order,
recast is not for any kind of numeric to string or string to numeric conversion. It only recasts among numeric or among string types.
encode is essentially for mapping obvious strings to numeric and (unless you specify otherwise) will produce integer values 1, 2, 3, and so forth which will be quite wrong for times or dates in general.
destring as you applied it implies that the string times "7:00", "7:59", "8:00" should be numeric, except that someone stupidly added irrelevant punctuation. But if you strip the colons :, you get times 700, 759, 800, etc. which will not match the standard properties of times. For example, the difference between "8:00" and "7:59" is clearly one minute, but removing the informative punctuation would just yield numbers 800 and 759, which differ by 41, which makes no sense.
For a pure time, you can set up your own system, or use Stata's date-time functions.
For a time between "00:00" and "23:59" you can use Stata's date-times:
. di %tc clock("7:00", "hm")
01jan1960 07:00:00
. di %tc_HH:MM clock("7:00", "hm")
07:00
With variables you would need to generate a new variable and make sure that it is created as double.
A pure time less than 24 hours is (notionally) a time on 1 January 1960, but you can ignore that. But you need to hold in mind (constantly!) that the underlying numeric units are milliseconds. Only the format gives you a time in conventional terms.
If you have times more than 24 hours, that is probably not a good idea.
Your own system could just be to convert string times in the form "hh:mm" to minutes and do calculations in those terms. For times held as variables, the easiest way forward would be to use split, destring to produce numeric variables holding hours and minutes and then use 60 * hours + minutes.
However, despite your title, the real problem here seems to be dealing jointly with date and time information, not just time information, so at this point, you might like to read the short answer again.